PedroMTQ / mantis

A package to annotate protein sequences
MIT License
53 stars 6 forks source link

'set' object has no attribute 'append' error #32

Closed mhyleung closed 2 years ago

mhyleung commented 2 years ago

Dear all

I first performed the run_test option of mantis and it worked fine. Then I set up python mantis run_mantis with test_sample.faa and it also worked fine. However, when I tried to run my own test samples (also .faa) of around 30,000 contig amino acid sequences, it gave me the following error:

Traceback (most recent call last):
  File "/mypath/miniconda3/envs/mantis_env/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/mypath/miniconda3/envs/mantis_env/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "mantis/source/Multiprocessing.py", line 579, in worker_interpret_output
    self.generate_integrated_output(output_annotation_tsv, interpreted_annotation_tsv)
  File "mantis/source/Metadata.py", line 281, in generate_integrated_output
    output_annotation = self.read_and_interpret_output_annotation(output_annotation_tsv)
  File "mantis/source/Metadata.py", line 193, in read_and_interpret_output_annotation
    ref_file_links = self.get_hit_links(links_to_get[ref_file], ref_file)
  File "mantis/source/Metadata.py", line 125, in get_hit_links
    self.get_common_links(hit, dict_hits[hit])
  File "mantis/source/Metadata.py", line 66, in get_common_links
    self.add_to_dict(res, 'pfam', pfam)
  File "mantis/source/Metadata.py", line 39, in add_to_dict
    dict_hits['link'][dict_key].append(i)
AttributeError: 'set' object has no attribute 'append'
Ran into an issue, check the log for details. Exitting!

I notice two things that are different between my .faa file and the test_sample.faa:

1) the fasta IDs for the test_sample.faa do not contain special characters, whereas my .faa file would have sequences containing IDs like that >seq1 # 4012 # 4944 # 1 # ID=1_6;partial=00;start_type=ATG;rbs_motif=None;rbs_spacer=None;gc_cont=0.666

2) my fasta sequences contain an asterisk at the end of the sequence, indicating a stop codon

Could either one of these two factors contributed to the error? If not, what could be giving this error? Thanks

Marcus

PedroMTQ commented 2 years ago

Hello @mhyleung

I've been refactoring and making some improvements and it seems I forgot to change that particular function. In any case, please redownload Mantis and it should now work. Please keep in mind sequence IDs such as >seq1 # 4012 # 4944 # 1 # ID=1_6;partial=00;start_type=ATG;rbs_motif=None;rbs_spacer=None;gc_cont=0.666 are converted to >seq1. This is done to save memory and to avoid parsing errors.

Sorry for the inconvenience.

Regards, Pedro

mhyleung commented 2 years ago

Hi Pedro

Thanks for the message. I have removed and reinstalled mantis. Actually same as last time, I ran the test, and it gave

Traceback (most recent call last):
  File "/mydrive/tools/miniconda3/envs/mantis_env/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/mydrive/tools/miniconda3/envs/mantis_env/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "mantis/__main__.py", line 241, in <module>
    run_mantis_test(target_path=add_slash(MANTIS_FOLDER + 'tests')+ 'test_sample.faa',
  File "mantis/source/MANTIS.py", line 104, in run_mantis_test
    mantis.run_mantis_test()
  File "mantis/source/utils.py", line 622, in wrapper
    res = f(self, *args, **kwargs)
  File "mantis/source/MANTIS.py", line 444, in run_mantis_test
    self.generate_translated_sample()
  File "mantis/source/MANTIS.py", line 398, in generate_translated_sample
    translation_tables = parse_translation_tables(ncbi_resources + 'gc.prt')
  File "mantis/source/utils.py", line 1168, in parse_translation_tables
    with open(ncbi_tables) as file:
FileNotFoundError: [Errno 2] No such file or directory: '/mydrive/tools/mantis/Resources/NCBI/gc.prt'

So what I did, as I did last time, was to download gc.prt manually using wget ftp.ncbi.nih.gov/entrez/misc/data/gc.prt into the Resources/NCBI directory , and the test ran fine afterwards, but a different error came back after I do actual run:

Process Process-185:
Traceback (most recent call last):
  File "/mydrive/tools/miniconda3/envs/mantis_env/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/mydrive/tools/miniconda3/envs/mantis_env/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "mantis/source/Multiprocessing.py", line 579, in worker_interpret_output
    self.generate_integrated_output(output_annotation_tsv, interpreted_annotation_tsv)
  File "mantis/source/Metadata.py", line 281, in generate_integrated_output
    output_annotation = self.read_and_interpret_output_annotation(output_annotation_tsv)
  File "mantis/source/Metadata.py", line 193, in read_and_interpret_output_annotation
    ref_file_links = self.get_hit_links(links_to_get[ref_file], ref_file)
  File "mantis/source/Metadata.py", line 112, in get_hit_links
    self.get_link_compiled_metadata(dict_hits=dict_hits, ref_file_path=self.mantis_paths['pfam'] + 'metadata.tsv')
  File "mantis/source/Metadata.py", line 45, in get_link_compiled_metadata
    hit_info=cursor.fetch_metadata(hit)
  File "mantis/source/Metadata_SQLITE_Connector.py", line 148, in fetch_metadata
    res=self.convert_sql_to_dict(res_fetch)
  File "mantis/source/Metadata_SQLITE_Connector.py", line 133, in convert_sql_to_dict
    sql_result=sql_result[1:]
TypeError: 'NoneType' object is not subscriptable

Could this be related to the fact that somehow gc.prt was not set up properly during database setup? Thanks

Marcus

PedroMTQ commented 2 years ago

Hello Marcus,

Thanks for notifying me on the gc.prt issue. I fixed the link now so it should download fine during setup. Regarding your second issue, it seems you are having issues with your metadata SQL database. I can't reproduce this error locally so I added some new code to check the database for errors. Could you please download the new mantis version and run check_sql ? Please report the results after and I will do my best to fix it.

Regards, Pedro

mhyleung commented 2 years ago

Thanks Pedro as always! I am going to set up my database first then run check_sql to see what the matter is. Cheers :)

M

mhyleung commented 2 years ago

Hi Pedro

Just wondering where the check_sql command would be? I am trying to run that now. Thanks

Marcus

PedroMTQ commented 2 years ago

Hello Marcus,

I forgot to add the option to run into when executing mantis (my bad). Could you please redownload the source code and try to run check_sql as you would with run_mantis or setup_databases? I dont know if you did in the past, but no need to run setup_databases again, just update the __main__.py file and the files in the source folder.

Regards, Pedro

mhyleung commented 2 years ago

Hi Pedro. I am running the check_sql command now. It appears that some of the verbose messages were in yellow and some in white. For example:

Checking /my_path/tools/mantis/References/NCBI/28211/metadata.db (this was in yellow) Checking /my_path/tools/mantis/References/NCBI/43080/metadata.db (this was in yellow) Checking /my_path/tools/mantis/References/NOG/4447/metadata.db (this was in yellow) Creating SQL database /my_path/tools/mantis/References/NOG/583/metadata.db (this was in white) Creating SQL database /my_path/tools/mantis/References/NOG/84406/metadata.db (this was in white)

and so forth...

Does that mean that during the setup database step some of these were skipped, and now the check_sql command was able to detect any missing databases and is now adding them back ?

The command is still running as I type, so I will get back to you as to whether everything is running smoothly...Thanks

Marcus

PedroMTQ commented 2 years ago

Hello Marcus, Yes, the yellow lines are the output from check_sql, the white lines correspond to the creation of these SQL databases. Since these are only created when you use the corresponding reference, you are seeing them now as you ran check_sql. Anyway, keep me posted.

Regards, Pedro

PedroMTQ commented 2 years ago

Hello Marcus,

Could you please let me know if this issue was resolved?

Regards, Pedro

mhyleung commented 2 years ago

Hi Pedro

For some reason my chcek_SQL command is still running after 5 days! It has been very slow but it's running so I am just going to leave it running for now. I shall let you know when it is done. Thanks

Marcus

PedroMTQ commented 2 years ago

Hello Marcus, That is a bit strange, it has never taken me that long. Is the command still displaying "checking ****/metadata.db"? But I guess depending on your system it's possible, since the check_sql is quite comprehensive (i.e., checks all metadata entries). I have recently added indexation to the SQL databases so it should be a lot faster to fetch information from it. If you'd like you could try to redownload the source code, remove all metadata.db files in your reference databases and restart the check_sql command. This will also result in faster mantis runs further down the line so it might be worth doing now.

Sorry for the inconvenience, this new method for metadata retrieval should be way better than the past one (txt parsing), but I still had to iron out a few kinks on how to make it efficient. Hopefully now everything should be working as it intended.

Regards, Pedro

mhyleung commented 2 years ago

Hey Pedro

Alright. I will try again and let you know.

M

mhyleung commented 2 years ago

ok that was indeed magnitudes faster :). I have already set up the databases, and my check_SQL command seemed to be alright. I shall do a trial run of my small sample now. Keep you updated!

M

mhyleung commented 2 years ago

Hi Pedro

I believe the output ran alright, with the following output files?

.
├── consensus_annotation.tsv
├── integrated_annotation.tsv
├── Mantis.out
└── output_annotation.tsv

Cheers

Marcus

PedroMTQ commented 2 years ago

Hey Marcus,

Glad to hear that. Yes, those are the output files. There's also a few other output files you could generate, but this depends on what kind of analysis you want to do. Anyway, if you are curious please check the wiki where I describe those and also the default output files.

Regards, Pedro