Closed mhyleung closed 2 years ago
Hello @mhyleung
I've been refactoring and making some improvements and it seems I forgot to change that particular function.
In any case, please redownload Mantis and it should now work.
Please keep in mind sequence IDs such as >seq1 # 4012 # 4944 # 1 # ID=1_6;partial=00;start_type=ATG;rbs_motif=None;rbs_spacer=None;gc_cont=0.666
are converted to >seq1
.
This is done to save memory and to avoid parsing errors.
Sorry for the inconvenience.
Regards, Pedro
Hi Pedro
Thanks for the message. I have removed and reinstalled mantis. Actually same as last time, I ran the test, and it gave
Traceback (most recent call last):
File "/mydrive/tools/miniconda3/envs/mantis_env/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/mydrive/tools/miniconda3/envs/mantis_env/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "mantis/__main__.py", line 241, in <module>
run_mantis_test(target_path=add_slash(MANTIS_FOLDER + 'tests')+ 'test_sample.faa',
File "mantis/source/MANTIS.py", line 104, in run_mantis_test
mantis.run_mantis_test()
File "mantis/source/utils.py", line 622, in wrapper
res = f(self, *args, **kwargs)
File "mantis/source/MANTIS.py", line 444, in run_mantis_test
self.generate_translated_sample()
File "mantis/source/MANTIS.py", line 398, in generate_translated_sample
translation_tables = parse_translation_tables(ncbi_resources + 'gc.prt')
File "mantis/source/utils.py", line 1168, in parse_translation_tables
with open(ncbi_tables) as file:
FileNotFoundError: [Errno 2] No such file or directory: '/mydrive/tools/mantis/Resources/NCBI/gc.prt'
So what I did, as I did last time, was to download gc.prt manually using wget ftp.ncbi.nih.gov/entrez/misc/data/gc.prt
into the Resources/NCBI directory , and the test ran fine afterwards, but a different error came back after I do actual run:
Process Process-185:
Traceback (most recent call last):
File "/mydrive/tools/miniconda3/envs/mantis_env/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/mydrive/tools/miniconda3/envs/mantis_env/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "mantis/source/Multiprocessing.py", line 579, in worker_interpret_output
self.generate_integrated_output(output_annotation_tsv, interpreted_annotation_tsv)
File "mantis/source/Metadata.py", line 281, in generate_integrated_output
output_annotation = self.read_and_interpret_output_annotation(output_annotation_tsv)
File "mantis/source/Metadata.py", line 193, in read_and_interpret_output_annotation
ref_file_links = self.get_hit_links(links_to_get[ref_file], ref_file)
File "mantis/source/Metadata.py", line 112, in get_hit_links
self.get_link_compiled_metadata(dict_hits=dict_hits, ref_file_path=self.mantis_paths['pfam'] + 'metadata.tsv')
File "mantis/source/Metadata.py", line 45, in get_link_compiled_metadata
hit_info=cursor.fetch_metadata(hit)
File "mantis/source/Metadata_SQLITE_Connector.py", line 148, in fetch_metadata
res=self.convert_sql_to_dict(res_fetch)
File "mantis/source/Metadata_SQLITE_Connector.py", line 133, in convert_sql_to_dict
sql_result=sql_result[1:]
TypeError: 'NoneType' object is not subscriptable
Could this be related to the fact that somehow gc.prt was not set up properly during database setup? Thanks
Marcus
Hello Marcus,
Thanks for notifying me on the gc.prt issue. I fixed the link now so it should download fine during setup.
Regarding your second issue, it seems you are having issues with your metadata SQL database. I can't reproduce this error locally so I added some new code to check the database for errors.
Could you please download the new mantis version and run check_sql
? Please report the results after and I will do my best to fix it.
Regards, Pedro
Thanks Pedro as always! I am going to set up my database first then run check_sql to see what the matter is. Cheers :)
M
Hi Pedro
Just wondering where the check_sql command would be? I am trying to run that now. Thanks
Marcus
Hello Marcus,
I forgot to add the option to run into when executing mantis (my bad). Could you please redownload the source code and try to run check_sql
as you would with run_mantis
or setup_databases
?
I dont know if you did in the past, but no need to run setup_databases again, just update the __main__.py
file and the files in the source
folder.
Regards, Pedro
Hi Pedro. I am running the check_sql command now. It appears that some of the verbose messages were in yellow and some in white. For example:
Checking /my_path/tools/mantis/References/NCBI/28211/metadata.db (this was in yellow) Checking /my_path/tools/mantis/References/NCBI/43080/metadata.db (this was in yellow) Checking /my_path/tools/mantis/References/NOG/4447/metadata.db (this was in yellow) Creating SQL database /my_path/tools/mantis/References/NOG/583/metadata.db (this was in white) Creating SQL database /my_path/tools/mantis/References/NOG/84406/metadata.db (this was in white)
and so forth...
Does that mean that during the setup database step some of these were skipped, and now the check_sql command was able to detect any missing databases and is now adding them back ?
The command is still running as I type, so I will get back to you as to whether everything is running smoothly...Thanks
Marcus
Hello Marcus, Yes, the yellow lines are the output from check_sql, the white lines correspond to the creation of these SQL databases. Since these are only created when you use the corresponding reference, you are seeing them now as you ran check_sql. Anyway, keep me posted.
Regards, Pedro
Hello Marcus,
Could you please let me know if this issue was resolved?
Regards, Pedro
Hi Pedro
For some reason my chcek_SQL command is still running after 5 days! It has been very slow but it's running so I am just going to leave it running for now. I shall let you know when it is done. Thanks
Marcus
Hello Marcus,
That is a bit strange, it has never taken me that long. Is the command still displaying "checking ****/metadata.db"? But I guess depending on your system it's possible, since the check_sql
is quite comprehensive (i.e., checks all metadata entries).
I have recently added indexation to the SQL databases so it should be a lot faster to fetch information from it.
If you'd like you could try to redownload the source code, remove all metadata.db files in your reference databases and restart the check_sql
command. This will also result in faster mantis runs further down the line so it might be worth doing now.
Sorry for the inconvenience, this new method for metadata retrieval should be way better than the past one (txt parsing), but I still had to iron out a few kinks on how to make it efficient. Hopefully now everything should be working as it intended.
Regards, Pedro
Hey Pedro
Alright. I will try again and let you know.
M
ok that was indeed magnitudes faster :). I have already set up the databases, and my check_SQL command seemed to be alright. I shall do a trial run of my small sample now. Keep you updated!
M
Hi Pedro
I believe the output ran alright, with the following output files?
.
├── consensus_annotation.tsv
├── integrated_annotation.tsv
├── Mantis.out
└── output_annotation.tsv
Cheers
Marcus
Hey Marcus,
Glad to hear that. Yes, those are the output files. There's also a few other output files you could generate, but this depends on what kind of analysis you want to do. Anyway, if you are curious please check the wiki where I describe those and also the default output files.
Regards, Pedro
Dear all
I first performed the run_test option of mantis and it worked fine. Then I set up
python mantis run_mantis
withtest_sample.faa
and it also worked fine. However, when I tried to run my own test samples (also .faa) of around 30,000 contig amino acid sequences, it gave me the following error:I notice two things that are different between my .faa file and the test_sample.faa:
1) the fasta IDs for the test_sample.faa do not contain special characters, whereas my .faa file would have sequences containing IDs like that
>seq1 # 4012 # 4944 # 1 # ID=1_6;partial=00;start_type=ATG;rbs_motif=None;rbs_spacer=None;gc_cont=0.666
2) my fasta sequences contain an asterisk at the end of the sequence, indicating a stop codon
Could either one of these two factors contributed to the error? If not, what could be giving this error? Thanks
Marcus