geneontology / neo

noctua entity ontology
9 stars 2 forks source link

Some e.coli entries not found #111

Open pgaudet opened 1 year ago

pgaudet commented 1 year ago

Hi, None of the entries that correspond to NCBI:83333 can be found in Noctua; are these being filtered somehow?

Thanks, Pascale

pgaudet commented 1 year ago

See for example P00509

pgaudet commented 1 year ago

reported by @ccasalsc

cmungall commented 1 year ago

What SOP do you use to determine if a gene product is present?

It seems to be there as expected:

image
pgaudet commented 1 year ago

Yes in fact, it is there, I didn't describe the issue clearly.

We expect to be able to find the entry by ID, but when you search P00509 nothing gets returned. Is that expected?

kltm commented 1 year ago

This shows up in the "general" autocomplete in the graph editor, but not on the landing page. Possible some restriction is in place? This may be a software issue and not a NEO load issue.

pgaudet commented 1 year ago

Looking more closely:

P00509 (the Ecoli entry) can be added in the Graph Editor using 'Add Individual', but not when using the 'Add Annoton -> enabled by'. Likewise for the Form editor: P00509 does not autocomplete.

You can compare the behavior with human entry P99999, this works everywhere.

So, it seems this ID can be found but Noctua doesn't know it's an 'entity'.

Thanks, Pascale

kltm commented 1 year ago

@pgaudet Well, that may actually be it and this should be bumped back over to the NEO repo. Before doing so, it would be good to get feedback from @balhoff to double check if it's getting marked correctly.

vanaukenk commented 1 year ago

Just looked at the bacterial and human entities in noctua-amigo and they do seem to have different parentage:

image

image

I'm not sure why that is, though.

As an aside, should this ticket go into the Noctua maintenance project?

kltm commented 1 year ago

@vanaukenk Now that we have a feel that this is data and loading vs. "noctua" software, I'd tend towards data/qc. Honestly, it doesn't matter too too much for me how it gets accounted for.

kltm commented 1 year ago

@pgaudet I just wanted to check that you were still having this issue? I can no longer reproduce from what is written above in the graph editor: Screenshot from 2022-12-13 16-47-10

pgaudet commented 1 year ago

P00509 is not found in the landing page

image

Nor is it found in the Add Annoton box:

image

However it is found in 'Add individual'

image

This looks the same as in October https://github.com/geneontology/neo/issues/111#issuecomment-1287332520

kltm commented 1 year ago

@pgaudet From my example above, it is found in enabled_by, but you'd have to search by label or the full identifier (i.e. UniProtKB:P00509). Search has historically never supported searching by the interal-only portion (an issue from way back); the general input in the graph editor is an exception to this.

pgaudet commented 1 year ago

Thanks @kltm It's super confusing that the behavior is different from human entries: for e.g. Q06187

image image

and different in the various search boxes. Still looks like a bug to me.

kltm commented 1 year ago

I'd bet that those are actual synonyms that are being included; in the general search, it is breaking things up on its own.

pgaudet commented 1 year ago

The UniProt ID is stored as a synonym?

If that's the case can we do that for all species?

kltm commented 1 year ago

This isn't a case of species or type, it's about what is made available in the synonym field (I suspect). The general doc was specially created some time ago to just take things and grind them up without structure--including creating its own "synonyms" by cutting identifiers up--so it would kind work for everything, but it has little structure connected to it and can't really be used for filtering and has no closures. The doc type used for most of the search boxes (everything but the ubernoodle) are structured and can be used for filtering with closures, but do not have the grab bag element in them so can only go off of what synonyms are offered.

pgaudet commented 1 year ago

Are all genes processes the same way? Sorry I still dont undertand why ahuman UniProt ID behaves one way and an E. coli UniProt ID behaves differently.

kltm commented 1 year ago

@pgaudet It might be easiest to just have this on the agenda for our next call; we can walk through a couple of examples.