Open kltm opened 2 years ago
@vanaukenk @dustine32 I'm not sure how large an issue this actually is, as the "correct" behavior we are setup for at this point should be that /no/ identifiers work to link into noctua that are not from model.geneontology.org, as the curation system should be "internal". If this is something we want, we probably want to add it as a separately specced feature. Otherwise, I'd like to set it to wontfix
as the issue should age out naturally over the next month or two as we move noctua to AWS.
IDs starting with RSA are Reactome IDs. Could it be that the problem is outside of GO-CAM, and be happening at Reactome?
@thomaspd I'm using Reactome IDs as that's what was discovered; we'd have to sample the rest of the ID spaces to see if it was just them and not something more general before digging in. Unless Reactome was linking to some pretty weird internal internals of ours, it seems like something an aggressive bot might do, as we've already seen that behavior with the API. If this problem doesn't disappear with the next devops update, it might be worth digging in and seeing where traffic is coming from.
Thanks for the clarification. I just wanted to make sure it wasn’t something outside our control
In some cases, when searching for identifiers in GO-CAMs on Google, toaster is revealed as a secondary hit, rather than having the appropriate name. For example, as of this writing,
R-HSA-9686347
does /not/ have this issue whileR-HSA-3968437
does.I believe that this comes from bots examining the JS code to find more URLs, as these are never presented as links. We've seen this before.
This is not great, but more for the reason I was hoping that no major bots would be able to get into the noctua system at all for any reason, if not coming in from models.geneontology.org: we currently have robots.txt in place which should be telling them all to go away. This may actually be in effect, which is why it works for some IDs and not others and these are old results that may eventually age out.
This should be naturally fixed by switching over to the new a devops setting which is underway (https://github.com/berkeleybop/bbops/issues/12) and having new URLs take over as the old ones age out.