bio-tools / biotoolsLint

Utility for verification of bio.tools content with reporting
1 stars 3 forks source link

Broken link detection #12

Open joncison opened 6 years ago

joncison commented 6 years ago

From @joncison on March 16, 2016 12:51

Federico says ...

"I think there should be some sort of automatic “broken link” detection. In fact, when the registry will be open and available to everybody and the number of registered service will be huge, it will be impossible to manually curate the registry checking that all the tools are still alive. Some sort of color code could help distinguishing links that appear to be broken from just few hours, or days or months… After a service has been unreachable for a pre-defined amount of time its status should change to closed (or something similar). Another interesting function could be to send an e-mail alert to the service contacts (if they want) when the service appears to be down for longer than a pre-determined number of hours."

Copied from original issue: bio-tools/biotoolsRegistry#88

joncison commented 6 years ago

This is part of the automated QA / QC (https://docs.google.com/document/d/1ATj2zJOlbR3Edk6QyGvPX5HStZBknqfx1Fwqk4k0kqE/edit)

joncison commented 6 years ago

Link checking has been added to routine QA/QC checks; once existing errors are systematically fixed we can revisit the colour-coding of links etc.

joncison commented 6 years ago

From @dansondergaard on August 10, 2017 11:31

We've experienced a bunch of dead homepage links while packaging tools here in Aarhus. I agree that an automated link-checking approach must be implemented and I think that we have the expertise and resources here in Aarhus to do it, since we have three student programmers working on bio.tools and packaging.

I'd like to make the suggestion more concrete:

I suggest the development of a simple service which periodically (e.g. monthly) checks all homepage links in the bio.tools database. IMO the service should be completely separate from the bio.tools web application.

The service will:

Furthermore:

Potentially, the service could also store a timestamp for when the tool was last checked.

Implementing this as a standalone service keeps the bio.tools code clean. It's also an approachable project for a student programmer.

I noticed that the bio.tools proposal hints at a general QA/QC mechanism, or at least a generic way of storing this type of information in the database and showing it to users. It should be no problem integrating with such a mechanism.

What do you think?

joncison commented 6 years ago

From @jlgelpi on August 10, 2017 11:56

We have this automatic checking functionality implemented at OpenEBench (Tools Monitoring section). See Green/Red dots in https://elixir.bsc.es/elixibilitas/ We can provide the list of dead links and errors obtained as a rest endpoint. Then you can deal with authors in the way you said. @redmitry can give more details.
This is still ongoing work, the plan is to monitor also "last-seen" time and "max time active".

joncison commented 6 years ago

From @redmitry on August 10, 2017 12:0

Hello Dan,

We periodically check (every 6h) all bio.tools tools homepages.

[update: 23 nov 17] https://openebench.bsc.es/monitor/metrics/**biotools:{id}/{type}/{host}**/project/website/operational https://openebench.bsc.es/monitor/metrics/biotools:3d-fun/web/3dfun.bioinfo.pl/project/website/operational Cheers, Dmitry

joncison commented 6 years ago

From @dansondergaard on August 10, 2017 12:19

@jlgelpi @redmitry That's great, a REST endpoint would be ideal.

@joncison I'd be interested in assigning this to one of our student programmers, but they would need access to the code base and potentially a small introduction to how it's organised (could be given in-person in Paris).

joncison commented 6 years ago

Implementation as proposed, in bio.tools, would be a really nice contribution.

@ekry : please take a look at this thread and arrange code access to Dan and his students, as needed. Dan, your guys will need to coordinate with Emil. This will have to wait till the deliverables are done (Sep on).

joncison commented 6 years ago

bumping priority - it's very closely related to https://github.com/bio-tools/biotoolsregistry/issues/223 and https://github.com/bio-tools/biotoolsregistry/issues/207