biocypher / metalinks

GNU General Public License v3.0
3 stars 2 forks source link

Reproducing Metalinks - pypath errors #4

Closed slobentanzer closed 5 months ago

slobentanzer commented 5 months ago

Hi @EliasFarr, trying to reproduce the metalinks build, I encounter problems with pypath. It is a little complex, but I'll try to summarise. There have been refactorings in pypath that changed quite a bit of the internal structure. Starting with the original version used, 0.14.48, I get this error:

INFO -- Downloading uniprot data...
Traceback (most recent call last):
  File "/Users/slobentanzer/GitHub/metalinks/create_knowledge_graph.py", line 288, in <module>
    main()
  File "/Users/slobentanzer/GitHub/metalinks/create_knowledge_graph.py", line 222, in main
    UNIPROT.download_uniprot_data(
  File "/Users/slobentanzer/GitHub/metalinks/metalinks/adapters/uniprot_metalinks.py", line 138, in download_uniprot_data
    self._download_uniprot_data()
  File "/Users/slobentanzer/GitHub/metalinks/metalinks/adapters/uniprot_metalinks.py", line 158, in _download_uniprot_data
    self.uniprot_ids = list(uniprot._all_uniprots(self.organism, self.rev))
  File "/Users/slobentanzer/GitHub/metalinks/.venv/lib/python3.10/site-packages/pypath/inputs/uniprot.py", line 82, in _all_uniprots
    l.strip() for l in data.split('\n')[1:] if l.strip()
AttributeError: 'NoneType' object has no attribute 'split'

Probably due to a change in the pypath.inputs module. Going to 0.15.4, which is used by the CROssBAR v2 team, this is resolved, but then I get an issue trying to map names using the pypath.mapping module.

Traceback (most recent call last):
  File "/Users/slobentanzer/GitHub/metalinks/create_knowledge_graph.py", line 288, in <module>
    main()
  File "/Users/slobentanzer/GitHub/metalinks/create_knowledge_graph.py", line 258, in main
    bc.write_nodes(UNIPROT.get_nodes())
  File "/Users/slobentanzer/GitHub/metalinks/.venv/lib/python3.10/site-packages/biocypher/_core.py", line 284, in write_nodes
    if not isinstance(nodes.peek(), BioCypherNode):
  File "/Users/slobentanzer/GitHub/metalinks/.venv/lib/python3.10/site-packages/more_itertools/more.py", line 342, in peek
    self._cache.append(next(self._it))
  File "/Users/slobentanzer/GitHub/metalinks/metalinks/adapters/uniprot_metalinks.py", line 319, in get_nodes
    symbol = mapping.map_name(protein_id.split(":")[1], "uniprot", "genesymbol")
  File "/Users/slobentanzer/GitHub/metalinks/.venv/lib/python3.10/site-packages/pypath/utils/mapping.py", line 3592, in map_name
    return mapper.map_name(
  File "/Users/slobentanzer/GitHub/metalinks/.venv/lib/python3.10/site-packages/pypath/share/common.py", line 2786, in wrapper
    return func(*args, **kwargs)
  File "/Users/slobentanzer/GitHub/metalinks/.venv/lib/python3.10/site-packages/pypath/utils/mapping.py", line 2041, in map_name
    uniprots = self._map_name(
  File "/Users/slobentanzer/GitHub/metalinks/.venv/lib/python3.10/site-packages/pypath/utils/mapping.py", line 2542, in _map_name
    tbl = self.which_table(
  File "/Users/slobentanzer/GitHub/metalinks/.venv/lib/python3.10/site-packages/pypath/utils/mapping.py", line 1602, in which_table
    self.load_mapping(
  File "/Users/slobentanzer/GitHub/metalinks/.venv/lib/python3.10/site-packages/pypath/utils/mapping.py", line 3240, in load_mapping
    reader = MapReader(param = resource, **kwargs)
  File "/Users/slobentanzer/GitHub/metalinks/.venv/lib/python3.10/site-packages/pypath/utils/mapping.py", line 258, in __init__
    self.load()
  File "/Users/slobentanzer/GitHub/metalinks/.venv/lib/python3.10/site-packages/pypath/utils/mapping.py", line 288, in load
    self.read()
  File "/Users/slobentanzer/GitHub/metalinks/.venv/lib/python3.10/site-packages/pypath/utils/mapping.py", line 450, in read
    getattr(self, method)()
  File "/Users/slobentanzer/GitHub/metalinks/.venv/lib/python3.10/site-packages/pypath/utils/mapping.py", line 561, in read_mapping_file
    for i, line in enumerate(infile):
  File "/Users/slobentanzer/GitHub/metalinks/.venv/lib/python3.10/site-packages/pypath/inputs/uniprot.py", line 340, in get_uniprot_sec
    proteome = all_uniprots(organism=organism)
NameError: name 'all_uniprots' is not defined. Did you mean: '_all_uniprots'?

Again, probably some internal change that creates an application problem with this particular version. Going to the latest version (0.16.10), we have another issue caused by the refactoring into different packages (e.g., pypath_common):

Traceback (most recent call last):
  File "/Users/slobentanzer/GitHub/metalinks/create_knowledge_graph.py", line 288, in <module>
    main()
  File "/Users/slobentanzer/GitHub/metalinks/create_knowledge_graph.py", line 222, in main
    UNIPROT.download_uniprot_data(
  File "/Users/slobentanzer/GitHub/metalinks/metalinks/adapters/uniprot_metalinks.py", line 130, in download_uniprot_data
    stack.enter_context(settings.context(retries=retries))
AttributeError: module 'pypath.share.settings' has no attribute 'context'

In this version, since the settings were moved to another library/module, we don't have access to the context management any more.

@EliasFarr and @dbdimitrov, would you maybe like to look into what is the best way to resolve this (which version to use, which part of the metalinks implementation to adapt to that version) together with Dénes?

dbdimitrov commented 5 months ago

Hi @slobentanzer,

You mentioned another uniprot adapter independent of pypath.

Can you point us to that one? And maybe give is a summary of its contents? We're interested in diseases, tissues, and cellular locations.

Thanks in advance!

slobentanzer commented 5 months ago

Sure, it is at https://github.com/IGVF-DACC/igvf-catalog/blob/main/data/adapters/uniprot_adapter.py. Don't know much more than what they state in the comments of that file, though. Never worked with Uniprot flat files before.

slobentanzer commented 5 months ago

BTW, the problem with pypath.utils.mapping is not only caused by the Uniprot adapter, but generally by the use of the map_names function (e.g. in the cellphone adapter).

dbdimitrov commented 5 months ago

@slobentanzer @EliasFarr I found the issue and was able to reproduce the whole workflow.

It's just a typo in pypath: https://github.com/saezlab/pypath/blob/9cab02d68a31e7ddaf64ba9ed2965148991da423/pypath/inputs/uniprot.py#L346

I will make a PR to pypath and we can install pypath from my fork until it gets merged.

dbdimitrov commented 5 months ago

https://github.com/dbdimitrov/pypath/tree/metalinks

frozen pypath version that actually works, the latest update has other issues.

slobentanzer commented 5 months ago

@dbdimitrov nice! A typo.. who needs unit tests, right? 🥲

~So we install that fork from GitHub for now?~ I should read the entire text of the post, I guess.

slobentanzer commented 5 months ago

With the forked version it works for me as well, good job @dbdimitrov. :) Closing this, and I consider Metalinks "reproducible" for now. Added a TODO for integrating the BioCypher Resource class for the download instead of that manual function, but that is optional and more of a convenience / harmonisation aspect.

dbdimitrov commented 5 months ago

@slobentanzer I agree about the Resource class, I only remembered now that you mentioned it.

I link this here as a reference for myself.