chrisjackson-pellicle / ParaGone

GNU General Public License v3.0
2 stars 1 forks source link

Error in align_selected_and_tree step #5

Closed Lizzie-Roeble closed 2 months ago

Lizzie-Roeble commented 4 months ago

Hi, I am having an issue in the align_selected_and_tree step, specifically with align_selected_and_tree.py and the seq_to_keep variable:

[INFO]:    ======> ALIGNMENT AND TREE FROM SELECTED SEQUENCES <======

[INFO]:    External outgroup taxa: []
[INFO]:    Internal outgroup taxa: ['Psiadia_amygdalina_Reunion_L1001_GHIL',
           'Artemisia_kauaiensis_Kauai_L1176_J',
           'Helichrysum_proteoides_Mauritius_L814_MNOPQ']
[etc]
[INFO]:    Taxon Psiadia_amygdalina_Reunion_L1001_GHIL is an internal outgroup, and has
           more than one sequence for gene At1g21840. Only the sequence most divergent
           from the ingroup taxa will be retained.
Traceback (most recent call last):
  File "/home4/p300503/.conda/envs/paragone/bin/paragone", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home4/p300503/.conda/envs/paragone/lib/python3.11/site-packages/paragone/paragone_main.py", line 724, in main
    args.func(args,
  File "/home4/p300503/.conda/envs/paragone/lib/python3.11/site-packages/paragone/paragone_main.py", line 534, in full_pipeline_main
    align_selected_and_tree.main(
  File "/home4/p300503/.conda/envs/paragone/lib/python3.11/site-packages/paragone/align_selected_and_tree.py", line 776, in main
    outgroups_added_folder = add_outgroup_seqs(args.qc_alignment_directory,
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home4/p300503/.conda/envs/paragone/lib/python3.11/site-packages/paragone/align_selected_and_tree.py", line 119, in add_outgroup_seqs
    internal_outgroup_dict_filtered = filter_internal_outgroups(internal_outgroup_dict,
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home4/p300503/.conda/envs/paragone/lib/python3.11/site-packages/paragone/align_selected_and_tree.py", line 242, in filter_internal_outgroups
    logger.debug(f'Keeping sequence {seq_to_keep.name} with distance {seq_to_keep_distance}')
                                     ^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'name'

I am running paragone with 3 internal outgroups and version 0.0.14rc (July 2023)

paragone full_pipeline paralog_input --internal_outgroup Artemisia_kauaiensis_Kauai_L1176_J --internal_outgroup Helichrysum_proteoides_Mauritius_L814_MNOPQ --internal_outgroup Psiadia_amygdalina_Reunion_L1001_GHIL --pool 1 --threads 1 --mo --rt --mi --keep_intermediate_files

I am testing the sensitivity of different outgroups, and my other paragone runs/outgroups have run smoothly.

Any thoughts on what is causing this issue? Let me know if you need any more information. Thanks, Lizzie

chrisjackson-pellicle commented 4 months ago

Hi Lizzie,

Hmm, looking at the code, I think this might be a bug that can occur when the outgroup sequences for a given taxon have identical sequence identity to one of the ingroup sequences. Is that a possibility with your dataset?

Would you be comfortable sharing your input data and command for the failed run, so I can double check this and push a bugfix? If so, you could email it to me directly at chris.jackson@rbg.vic.gov.au.

Cheers,

Chris

Lizzie-Roeble commented 3 months ago

Hi Chris,

Yes - the issue was an ingroup sample being closely related to the outgroup. This was not intentional...the ingroup sample was mislabeled. I now have had a chance to rerun with the sample removed, and ParaGone runs as expected.

Thanks! Lizzie

chrisjackson-pellicle commented 2 months ago

Fixed in version 1.1.0