ParkinsonLab / MetaPro

GNU General Public License v3.0
18 stars 3 forks source link

output errors #15

Closed mimhmw closed 11 months ago

mimhmw commented 1 year ago

I ran the code below as tutorial mode, but there are errors as below. Could you figure out what the problem is?

python3 /pipeline/MetaPro.py -c /pipeline/Config.ini -s /input/PF_TruSeq.fastq --no-host --tutorial output -o /output . . . making Taxa summary Traceback (most recent call last): File "/pipeline/Scripts/output_table_v3.py", line 186, in rank_name.append(names_dict[taxid]) KeyError: '2' Reformat RPKM for EC heatmap Traceback (most recent call last): File "/pipeline/Scripts/output_reformat_rpkm_table.py", line 12, in with open(input_rpkm, "r") as rpkm_file: FileNotFoundError: [Errno 2] No such file or directory: '/output/outputs/final_results/RPKM_table.tsv' 2023-08-09 20:12:35.180214 running: output_network_generation 2023-08-09 20:12:35.180310 running: output_read_count 2023-08-09 20:12:35.182646 running: output_per_read_scores 2023-08-09 20:12:35.184891 running: output_ec_heatmap 2023-08-09 20:12:35.187970 output report phase 3 launched. waiting for sync 2023-08-09 20:12:35.188069 closing down processes: 3 2023-08-09 20:12:35.180455 generating read count table collecting per-read quality 2023-08-09 20:12:35.185002 forming EC heatmap Traceback (most recent call last): File "/pipeline/Scripts/output_EC_metrics.py", line 40, in super_df = pd.read_csv(pathway_superpathway_file, sep = ',', skip_blank_lines = False) File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 688, in read_csv return _read(filepath_or_buffer, kwds) File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 454, in _read parser = TextFileReader(fp_or_buf, kwds) File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 948, in init self._make_engine(self.engine) File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 1180, in _make_engine self._engine = CParserWrapper(self.f, self.options) File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 2010, in init self._reader = parsers.TextReader(src, **kwds) File "pandas/_libs/parsers.pyx", line 382, in pandas._libs.parsers.TextReader.cinit File "pandas/_libs/parsers.pyx", line 674, in pandas._libs.parsers.TextReader._setup_parser_source FileNotFoundError: [Errno 2] No such file or directory: '/database/path_to_superpath/pathway_to_superpathway.csv' 2023-08-09 20:14:04.020475 running: output_read_count
2023-08-09 20:14:04.020641 running: output_ec_heatmap 2023-08-09 20:14:04.020733 running: output_per_read_scores Outputs: 122.7 s Outputs cleanup: 0.0 s

MSchostag commented 1 year ago

Dear MetaPro developer,

I have the exact same issue when running the whole pipeline using the mouse tutorial dataset and the databases using (lib_downloader.py).

python3 /pipeline/MetaPro.py -c /meta_tut/config_mouse.ini -s /meta_tut/tutorial/mouse1.fastq -o /meta_tut/tutorial/231116_mouse_trial

Error that I get:

2023-11-15 09:59:08.690826 running: output_taxa_groupby Generating RPKM and Cytoscape network making Taxa summary Traceback (most recent call last): File "/pipeline/Scripts/output_table_v3.py", line 186, in rank_name.append(names_dict[taxid]) KeyError: '2' Reformat RPKM for EC heatmap Traceback (most recent call last): File "/pipeline/Scripts/output_reformat_rpkm_table.py", line 12, in with open(input_rpkm, "r") as rpkm_file: FileNotFoundError: [Errno 2] No such file or directory: '/meta_tut/231115_trial/outputs/final_results/RPKM_table.tsv' 2023-11-15 09:59:08.691932 output report phase 2 launched. waiting for sync 2023-11-15 09:59:08.691985 closing down processes: 2 2023-11-15 09:59:08.691998 closed down: 0/2
2023-11-15 09:59:12.253750 closed down: 1/2
2023-11-15 09:59:12.254583 running: output_network_generation 2023-11-15 09:59:12.254880 running: output_read_count 2023-11-15 09:59:12.258975 running: output_per_read_scores 2023-11-15 09:59:12.262682 running: output_ec_heatmap 2023-11-15 09:59:12.255344 generating read count table collecting per-read quality 2023-11-15 09:59:12.262799 forming EC heatmap Traceback (most recent call last): File "/pipeline/Scripts/output_EC_metrics.py", line 78, in rpkm_df = pd.read_csv(rpkm_table_file, sep = '\t', skip_blank_lines = False) File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 688, in read_csv return _read(filepath_or_buffer, kwds) File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 454, in _read parser = TextFileReader(fp_or_buf, kwds) File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 948, in init self._make_engine(self.engine) File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 1180, in _make_engine self._engine = CParserWrapper(self.f, self.options) File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 2010, in init self._reader = parsers.TextReader(src, **kwds) File "pandas/_libs/parsers.pyx", line 540, in pandas._libs.parsers.TextReader.cinit pandas.errors.EmptyDataError: No columns to parse from file

Is there any way this could be fixed?

Looking forward to your answer.

Kind regards, Morten Schostag - DTU

billytaj commented 12 months ago

Hi, there seems to be something funny happening in your table.
The error is having issues with a taxid 2 (bacteria). The problem is: whatever file you're using for names.dmp can't find the key of "2". What does your config look like? (specifically, what file are you using for the names: under [Databases])?

MSchostag commented 11 months ago

Dear Billy,

Thanks for replying. For the names.dmp I have used the names_wevote.dmp. Just like in the example that you have uploaded on github. Is this wrong??

I have attached the config file config_low_mem.txt

Thanks for looking into this.

Kind regards, Morten

billytaj commented 11 months ago

walking through the checklist here:

your names_wevote.dmp is located in /meta_tut/databases/WEVOTE_db/names_wevote.dmp?

What does your singularity call look like? Need to know if it's simply not detecting the wevote file.

the brief explanation is: the outputs is having trouble looking for taxid 2, when it's trying to find it from names_wevote.dmp. And the only way it'll run into this error is if names_wevote doesn't have 2. but that's bacteria. So, something is off with the import of names, which is improbable since it's the file we supplied, so I'm following the chain.

MSchostag commented 11 months ago

your names_wevote.dmp is located in /meta_tut/databases/WEVOTE_db/names_wevote.dmp?

I have also tried using the Mouse tutorial data, however I have downloaded all the database from your server. Again I get the same error.

But there seems to be a difference in the two files names_wevote.dmp and names.dmp that you provide. Is there a reason that there is the two files? there is also a nodes.dmp and a nodes_wevote.dmp. so which one to choose?

billytaj commented 11 months ago

0) I'm not seeing duplicates of names_wevote, and names. What version of the pipeline are you using? 1) names.dmp, and its counterpart nodes.dmp are created by NCBI as a part of their taxonomy dump 2) names_wevote.dmp is the same file. but the special name is for WEVOTE. <which we'll be getting rid of in a future version. I need time to polish the next version> 3) You could pull your own copy of names.dmp and nodes.dmp, but WEVOTE would need them named accordingly, and you would have to restart taxonomic classification.

4) Are you able to confirm that taxonomic classification ran with no issues? <specifically wevote. if it can't sense names_wevote.dmp, then I have a suspicion that the failure is upstream too.> 5) Are you able to confirm that /mnt/raid2/mdesc contains a folder called WEVOTE_db/names_wevote.dmp? <I'm seeing a bunch of issues stemming from a bad bind-mount from other users.>

MSchostag commented 11 months ago
  1. I'm using the latest version. docker run -it -v /mnt/raid2/mdesc/:/meta_tut parkinsonlab/metapro:latest 1-3. Okay, thanks for the info
  2. Everything ran perfectly except the last part of the pipeline. See the attached stderr and stdout file output_231115_trial.txt
    1. Yes. Here is the folder content. 6.root@b1674dc3fabf:/# ls -lh meta_tut/databases/WEVOTE_db/ total 577M -rw-r--r-- 1 1714099238 132000513 17M Jul 19 2021 citations.dmp -rw-r--r-- 1 1714099238 132000513 3.7M Jul 19 2021 delnodes.dmp -rw-r--r-- 1 1714099238 132000513 442 Jul 19 2021 division.dmp -rw-r--r-- 1 1714099238 132000513 15K Jul 19 2021 gc.prt -rw-r--r-- 1 1714099238 132000513 4.5K Jul 19 2021 gencode.dmp -rw-r--r-- 1 1714099238 132000513 907K Jul 19 2021 merged.dmp -rw-r--r-- 1 1714099238 132000513 151M Jul 19 2021 names.dmp -rw-r--r-- 1 1714099238 132000513 103M Jul 19 2021 names_wevote.dmp -rw-r--r-- 1 1714099238 132000513 118M Jul 19 2021 nodes.dmp -rw-r--r-- 1 1714099238 132000513 118M Jul 19 2021 nodes_wevote.dmp -rw-r--r-- 1 1714099238 132000513 2.6K Jul 19 2021 readme.txt

Everything was downloaded through your script: lib_downloader.py

MSchostag commented 11 months ago

I just tried changing from names_wevote.dmp to names.dmp, and the same for nodes_wevote.dmp to nodes.dmp, and now it works fine. So there most be something with the nodes_wevote.dmp names_wevote.dmp files coming from https://compsysbio.org/metapro_libs/.

billytaj commented 11 months ago

thanks for the info! I'll fix that