Open pgaudet opened 1 year ago
reported by @ccasalsc
What SOP do you use to determine if a gene product is present?
It seems to be there as expected:
Yes in fact, it is there, I didn't describe the issue clearly.
We expect to be able to find the entry by ID, but when you search P00509 nothing gets returned. Is that expected?
This shows up in the "general" autocomplete in the graph editor, but not on the landing page. Possible some restriction is in place? This may be a software issue and not a NEO load issue.
Looking more closely:
P00509 (the Ecoli entry) can be added in the Graph Editor using 'Add Individual', but not when using the 'Add Annoton -> enabled by'. Likewise for the Form editor: P00509 does not autocomplete.
You can compare the behavior with human entry P99999, this works everywhere.
So, it seems this ID can be found but Noctua doesn't know it's an 'entity'.
Thanks, Pascale
@pgaudet Well, that may actually be it and this should be bumped back over to the NEO repo. Before doing so, it would be good to get feedback from @balhoff to double check if it's getting marked correctly.
Just looked at the bacterial and human entities in noctua-amigo and they do seem to have different parentage:
I'm not sure why that is, though.
As an aside, should this ticket go into the Noctua maintenance project?
@vanaukenk Now that we have a feel that this is data and loading vs. "noctua" software, I'd tend towards data/qc. Honestly, it doesn't matter too too much for me how it gets accounted for.
@pgaudet I just wanted to check that you were still having this issue? I can no longer reproduce from what is written above in the graph editor:
P00509 is not found in the landing page
Nor is it found in the Add Annoton box:
However it is found in 'Add individual'
This looks the same as in October https://github.com/geneontology/neo/issues/111#issuecomment-1287332520
@pgaudet From my example above, it is found in enabled_by, but you'd have to search by label or the full identifier (i.e. UniProtKB:P00509). Search has historically never supported searching by the interal-only portion (an issue from way back); the general input in the graph editor is an exception to this.
Thanks @kltm It's super confusing that the behavior is different from human entries: for e.g. Q06187
and different in the various search boxes. Still looks like a bug to me.
I'd bet that those are actual synonyms that are being included; in the general search, it is breaking things up on its own.
The UniProt ID is stored as a synonym?
If that's the case can we do that for all species?
This isn't a case of species or type, it's about what is made available in the synonym field (I suspect). The general
doc was specially created some time ago to just take things and grind them up without structure--including creating its own "synonyms" by cutting identifiers up--so it would kind work for everything, but it has little structure connected to it and can't really be used for filtering and has no closures. The doc type used for most of the search boxes (everything but the ubernoodle) are structured and can be used for filtering with closures, but do not have the grab bag element in them so can only go off of what synonyms are offered.
Are all genes processes the same way? Sorry I still dont undertand why ahuman UniProt ID behaves one way and an E. coli UniProt ID behaves differently.
@pgaudet It might be easiest to just have this on the agenda for our next call; we can walk through a couple of examples.
Hi, None of the entries that correspond to NCBI:83333 can be found in Noctua; are these being filtered somehow?
Thanks, Pascale