bio-tools / biotoolsRegistry

biotoolsregistry : discovery portal for bioinformatics
GNU General Public License v3.0
69 stars 20 forks source link

Broken link detection #88

Closed joncison closed 6 years ago

joncison commented 8 years ago

Federico says ...

"I think there should be some sort of automatic “broken link” detection. In fact, when the registry will be open and available to everybody and the number of registered service will be huge, it will be impossible to manually curate the registry checking that all the tools are still alive. Some sort of color code could help distinguishing links that appear to be broken from just few hours, or days or months… After a service has been unreachable for a pre-defined amount of time its status should change to closed (or something similar). Another interesting function could be to send an e-mail alert to the service contacts (if they want) when the service appears to be down for longer than a pre-determined number of hours."

joncison commented 7 years ago

This is part of the automated QA / QC (https://docs.google.com/document/d/1ATj2zJOlbR3Edk6QyGvPX5HStZBknqfx1Fwqk4k0kqE/edit)

joncison commented 7 years ago

Link checking has been added to routine QA/QC checks; once existing errors are systematically fixed we can revisit the colour-coding of links etc.

dansondergaard commented 7 years ago

We've experienced a bunch of dead homepage links while packaging tools here in Aarhus. I agree that an automated link-checking approach must be implemented and I think that we have the expertise and resources here in Aarhus to do it, since we have three student programmers working on bio.tools and packaging.

I'd like to make the suggestion more concrete:

I suggest the development of a simple service which periodically (e.g. monthly) checks all homepage links in the bio.tools database. IMO the service should be completely separate from the bio.tools web application.

The service will:

Furthermore:

Potentially, the service could also store a timestamp for when the tool was last checked.

Implementing this as a standalone service keeps the bio.tools code clean. It's also an approachable project for a student programmer.

I noticed that the bio.tools proposal hints at a general QA/QC mechanism, or at least a generic way of storing this type of information in the database and showing it to users. It should be no problem integrating with such a mechanism.

What do you think?

jlgelpi commented 7 years ago

We have this automatic checking functionality implemented at OpenEBench (Tools Monitoring section). See Green/Red dots in https://elixir.bsc.es/elixibilitas/ We can provide the list of dead links and errors obtained as a rest endpoint. Then you can deal with authors in the way you said. @redmitry can give more details.
This is still ongoing work, the plan is to monitor also "last-seen" time and "max time active".

redmitry commented 7 years ago

Hello Dan,

We periodically check (every 6h) all bio.tools tools homepages.

[update: 23 nov 17] https://openebench.bsc.es/monitor/metrics/**bio.tools:{id}/{type}/{host}**/project/website/operational https://openebench.bsc.es/monitor/metrics/bio.tools:3d-fun/web/3dfun.bioinfo.pl/project/website/operational Cheers, Dmitry

dansondergaard commented 7 years ago

@jlgelpi @redmitry That's great, a REST endpoint would be ideal.

@joncison I'd be interested in assigning this to one of our student programmers, but they would need access to the code base and potentially a small introduction to how it's organised (could be given in-person in Paris).

joncison commented 7 years ago

Implementation as proposed, in bio.tools, would be a really nice contribution.

@ekry : please take a look at this thread and arrange code access to Dan and his students, as needed. Dan, your guys will need to coordinate with Emil. This will have to wait till the deliverables are done (Sep on).

joncison commented 6 years ago

bumping priority - it's very closely related to https://github.com/bio-tools/biotoolsregistry/issues/223 and https://github.com/bio-tools/biotoolsregistry/issues/207

joncison commented 6 years ago

This issue was moved to bio-tools/biotoolsLint#12