TheBrownLab / PhyloFisher

PhyloFisher is a software package written in Python3 that can be used for the creation, analysis, and visualization of phylogenomic datasets that consist of eukaryotic protein sequences.
MIT License
31 stars 15 forks source link

Errors running sgt_constructor.py: `Error in rule prequal`, `Error in rule length_filter_mafft` #99

Closed mjamy closed 1 year ago

mjamy commented 1 year ago

Hello,

I'm using Phylofisher version 1.2.11. I prepared a custom database, and am getting several errors when running sgt_constructor.py. An example below:

Error in rule length_filter_mafft:
    jobid: 1737
    output: /cluster/projects/nn8118k/mahwash/euk-niche-evolution/00_global_tree/00_data/fisher_to_matrix/sgt_constructor_out_Sep.20.2023/length_filtration/mafft/mcm7.aln
    log: /cluster/projects/nn8118k/mahwash/euk-niche-evolution/00_global_tree/00_data/fisher_to_matrix/sgt_constructor_out_Sep.20.2023/logs/length_filter_mafft/mcm7.log (check log file(s) for error message)
    conda-env: /cluster/projects/nn8118k/mahwash/euk-niche-evolution/00_global_tree/00_data/fisher_to_matrix/.snakemake/conda/56b71125409b5153869b28cac9761feb
    shell:
        mafft --thread 1 --globalpair --maxiterate 1000 --unalignlevel 0.6 /cluster/projects/nn8118k/mahwash/euk-niche-evolution/00_global_tree/00_data/fisher_to_matrix/sgt_constructor_out_Sep.20.2023/prequal/mcm7.aa.filtered >/cluster/projects/nn8118k/mahwash/euk-niche-evolution/00_global_tree/00_data/fisher_to_matrix/sgt_constructor_out_Sep.20.2023/length_filtration/mafft/mcm7.aln 2>/cluster/projects/nn8118k/mahwash/euk-niche-evolution/00_global_tree/00_data/fisher_to_matrix/sgt_constructor_out_Sep.20.2023/logs/length_filter_mafft/mcm7.log
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job length_filter_mafft since they might be corrupted:
/cluster/projects/nn8118k/mahwash/euk-niche-evolution/00_global_tree/00_data/fisher_to_matrix/sgt_constructor_out_Sep.20.2023/length_filtration/mafft/mcm7.aln

I've tried running the script several times and always get the same error.

I've attached the log file of the run. 2023-09-20T111748.098705.snakemake.log

Here is the .tar.gz file of the logs directory in sgt_constructor_out_Sep.20.2023 from the run (I started the run again, so there might be some extra files). logs.tar.gz

Any help would be appreciated!

Thank you, Mahwash

robert-ervin-jones commented 1 year ago

Hi Mahwash,

Thanks for reaching out. We are not able to resolve this based on the log files alone. Could you share your starting sequence files for POLR2B and mcm7?

Best, Robert

mjamy commented 1 year ago

Hi Robert,

Sure! Did you mean the files from the working_dataset_constructor_out folder?

Here they are: fasta.tar.gz

Best, Mahwash

atice commented 1 year ago

Hi Mahwash,

Yes these are what we needed. Thank you. We will look into it.

ps Could you give us a bit of insight has to where the genes for your custom database came from. Such as an orthofinder run or all vs all blast and MCL clustering etc.

pss. Have you run sgt_constructor.py successfully on this cluster before?

Thanks, Alex

mjamy commented 1 year ago

Hi Alex,

Sure. The database is based on the 320-gene dataset from Strassert et al 2021, to which I added more taxa from Tikhonenkov et al 2022. I used the initial fasta files and cleaned fasta files from the studies to separate orthologs and paralogs.

This is the first time I'm running PhyloFisher on this cluster - so the problem could be related to that!

Thanks, Mahwash

robert-ervin-jones commented 1 year ago

Hi Mahwash,

Both POLR2B and mcm7 finished fine for me. I think something on the cluster may be prematurely terminating the processes. There isn't any error in the logs. They just abruptly end. It would probably be worthwhile to reach out to a system administrator for your cluster and see if they can help you pinpoint what is happening.

Let me know if there is anything else we can help with.

Best, Robert

mjamy commented 1 year ago

Hi Robert,

Good to know where the trouble lies. Many thanks for your help!

Best, Mahwash

mjamy commented 1 year ago

Hi again!

Just wanted to post an update. Turns out it was just a stupid mistake on my part, and the error went away when I increased the memory allocation.

Best, Mahwash

atice commented 1 year ago

Hi Mahwash,

Glad to hear you have it running now! Thanks for the update.

Alex

Get Outlook for iOShttps://aka.ms/o0ukef


From: Mahwash Jamy @.> Sent: Friday, September 22, 2023 2:54:55 AM To: TheBrownLab/PhyloFisher @.> Cc: Tice, Alex @.>; Comment @.> Subject: Re: [TheBrownLab/PhyloFisher] Errors running sgt_constructor.py: Error in rule prequal, Error in rule length_filter_mafft (Issue #99)

Hi again!

Just wanted to post an update. Turns out it was just a stupid mistake on my part, and the error went away when I increased the memory allocation.

Best, Mahwash

— Reply to this email directly, view it on GitHubhttps://secure-web.cisco.com/1vLxdDPY5nfKU2-rcrV5cwYfR2C77j7PFwBO_tGEl6cZp7li4323Vrw_PSwKdU6M9xujOxecJC8JzLTPyQOg29HFzGkkk0-Gi6WT5eYtpeCHXrCMuH2YsYTz08z70K4eFA7KOAAXCYBHpgRB-Y8j-lw7omi378ZXRhcljTykaYCGEht8dBMA5waoK0PFePwl-39wYuyyWwliW2wSL4u36pe7ZhEAI_k4xuxmNSQKb7rQ0CZ8-1G9sb7plAmL1bZC33yqePrtzv4yJYZvRSbcrW_dUmi4awqudECn9NPtleFgFAvfJk16JgAvHDoqVkx1R/https%3A%2F%2Fgithub.com%2FTheBrownLab%2FPhyloFisher%2Fissues%2F99%23issuecomment-1730971269, or unsubscribehttps://secure-web.cisco.com/1AM5Gd9E-yQvVYyFlblKehWkUOr0eluFJPU7PZspfzgZCh9Tyxq603bx_u6eem1Df05yb6Gc69zs1bNynegcsLyP4lV6NTNujWwrOKZvOu5RpLRhaKQySIMk5-W4SGS00jy5IwqpeuDbf3TvYdEOAj_qd6aJqDEgB_IzOvvjVd8B6ARWlh2eK4ZMHaNjHzeh3vDbz78aleIfxKIUFayHUXGsRuOZ98RQXI8shZhwmKxddCmHNsGA1mk7CmxovxygRL89USohXTq9XKEUtHWah6HagmXuUUYcZl5EONF7P03tx2oK4ZLZkAqfm2BRIBouo/https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FADA4OK2N2T2JMCU5YW674OTX3U747ANCNFSM6AAAAAA5ADU34Q. You are receiving this because you commented.Message ID: @.***>