AntonelliLab / seqcap_processor

Bioinformatic pipeline for processing Sequence Capture data for Phylogenetics
MIT License
21 stars 12 forks source link

Problem with loci alignments #35

Open PierreHenriFabre opened 1 year ago

PierreHenriFabre commented 1 year ago

Hi, first of all thanks for this secapr pipeline. I am analysing a gene capture dataset of 60 specimen and 484 genes. I have an issu with loci alignments, when I run secapr_alignment the pipeline allways process 40 gene only despite the fact that the assembly and blast target previous step yield most of the genes. I install an alternative version of secapt (the one install with pip instead of conda installer). The second version does work for the aluigbement but do not work for the assembly and bwa steps, I was wondering if there is a way to modify this 40 genes limits and in which script I should look in order to fix this problem.

INFO: Note: NumExpr detected 40 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8. INFO: NumExpr defaulting to 8 threads. [WARNING] Output directory exists, REMOVE [Y/n]? Y Aligning sequence collections 39/40

Thanks for your help, I apologize if i made a mistake while running the pipeline.

tandermann commented 1 year ago

Which version of secapr are you running (check secapr -v)?

Best, Tobias


Tobias Andermann, PhD Assistant professor Data-Driven Life Sciences Fellowhttps://www.scilifelab.se/data-driven/fellows/

Department of Organismal Biology SciLifeLab Uppsala University Sweden

@.**@.> +46 76 090 1106 github.com/tandermannhttps://github.com/tandermann Google Scholar profilehttps://scholar.google.com/citations?user=OxZM3SwAAAAJ&hl=en

On 9 Jan 2023, at 15:16, PierreHenriFabre @.***> wrote:

Hi, first of all thanks for this secapr pipeline. I am analysing a gene capture dataset of 60 specimen and 484 genes. I have an issu with loci alignments, when I run secapr_alignment the pipeline allways process 40 gene only despite the fact that the assembly and blast target previous step yield most of the genes. I install an alternative version of secapt (the one install with pip instead of conda installer). The second version does work for the aluigbement but do not work for the assembly and bwa steps, I was wondering if there is a way to modify this 40 genes limits and in which script I should look in order to fix this problem.

INFO: Note: NumExpr detected 40 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8. INFO: NumExpr defaulting to 8 threads. [WARNING] Output directory exists, REMOVE [Y/n]? Y Aligning sequence collections 39/40

Thanks for your help, I apologize if i made a mistake while running the pipeline.

— Reply to this email directly, view it on GitHubhttps://github.com/AntonelliLab/seqcap_processor/issues/35, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACRWKRPEP4FVMBZ2KWBTBPDWRQMVTANCNFSM6AAAAAATVPM6HU. You are receiving this because you are subscribed to this thread.Message ID: @.***>

VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.

När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/

E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy

PierreHenriFabre commented 1 year ago

Dear Tobias,

Thanks for your reply. secapr --version It provide me 2.1.0

I install the one from github with miniconda3. This version work for most of the modules appart alignment and plot for which I install the pip version (on a different image). I got some problems to run the one with pip installer for the modules.

Thanks for your help, it will be very helpful for me to have the good running version to analyze my exon capture data. Cheers Pierre-Henri

De: "Tobias Andermann" @.> À: "AntonelliLab/seqcap_processor" @.> Cc: "Pierre-henri Fabre" @.>, "Author" @.> Envoyé: Mardi 17 Janvier 2023 09:22:36 Objet: Re: [AntonelliLab/seqcap_processor] Problem with loci alignments (Issue

35)

Which version of secapr are you running (check secapr -v)?

Best, Tobias


Tobias Andermann, PhD Assistant professor Data-Driven Life Sciences Fellowhttps://www.scilifelab.se/data-driven/fellows/

Department of Organismal Biology SciLifeLab Uppsala University Sweden

@.**@.> +46 76 090 1106 github.com/tandermannhttps://github.com/tandermann Google Scholar profilehttps://scholar.google.com/citations?user=OxZM3SwAAAAJ&hl=en

On 9 Jan 2023, at 15:16, PierreHenriFabre @.***> wrote:

Hi, first of all thanks for this secapr pipeline. I am analysing a gene capture dataset of 60 specimen and 484 genes. I have an issu with loci alignments, when I run secapr_alignment the pipeline allways process 40 gene only despite the fact that the assembly and blast target previous step yield most of the genes. I install an alternative version of secapt (the one install with pip instead of conda installer). The second version does work for the aluigbement but do not work for the assembly and bwa steps, I was wondering if there is a way to modify this 40 genes limits and in which script I should look in order to fix this problem.

INFO: Note: NumExpr detected 40 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8. INFO: NumExpr defaulting to 8 threads. [WARNING] Output directory exists, REMOVE [Y/n]? Y Aligning sequence collections 39/40

Thanks for your help, I apologize if i made a mistake while running the pipeline.

— Reply to this email directly, view it on GitHubhttps://github.com/AntonelliLab/seqcap_processor/issues/35, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACRWKRPEP4FVMBZ2KWBTBPDWRQMVTANCNFSM6AAAAAATVPM6HU. You are receiving this because you are subscribed to this thread.Message ID: @.***>

VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.

När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/

E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy

— Reply to this email directly, [ https://github.com/AntonelliLab/seqcap_processor/issues/35#issuecomment-1385009549 | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/A5E7D2M7WG5UZQMB2F6SK43WSZJEZANCNFSM6AAAAAATVPM6HU | unsubscribe ] . You are receiving this because you authored the thread. Message ID: <AntonelliLab/seqcap_processor/issues/35/1385009549 @ github . com>

tandermann commented 1 year ago

I see, that is an outdated version and has come bugs. Try instgalling the latest development veriosn by following the instructions at the end of the readme on github (https://github.com/AntonelliLab/seqcap_processor). Let me know in case you run into problems.

Best, Tobias


Tobias Andermann, PhD Assistant professor Data-Driven Life Sciences Fellowhttps://www.scilifelab.se/data-driven/fellows/

Department of Organismal Biology SciLifeLab Uppsala University Sweden

@.**@.> +46 76 090 1106 github.com/tandermannhttps://github.com/tandermann Google Scholar profilehttps://scholar.google.com/citations?user=OxZM3SwAAAAJ&hl=en

On 17 Jan 2023, at 09:57, PierreHenriFabre @.***> wrote:

Dear Tobias,

Thanks for your reply. secapr --version It provide me 2.1.0

I install the one from github with miniconda3. This version work for most of the modules appart alignment and plot for which I install the pip version (on a different image). I got some problems to run the one with pip installer for the modules.

Thanks for your help, it will be very helpful for me to have the good running version to analyze my exon capture data. Cheers Pierre-Henri

De: "Tobias Andermann" @.> À: "AntonelliLab/seqcap_processor" @.> Cc: "Pierre-henri Fabre" @.>, "Author" @.> Envoyé: Mardi 17 Janvier 2023 09:22:36 Objet: Re: [AntonelliLab/seqcap_processor] Problem with loci alignments (Issue

35)

Which version of secapr are you running (check secapr -v)?

Best, Tobias


Tobias Andermann, PhD Assistant professor Data-Driven Life Sciences Fellowhttps://www.scilifelab.se/data-driven/fellows/

Department of Organismal Biology SciLifeLab Uppsala University Sweden

@.**@.> +46 76 090 1106 github.com/tandermannhttp://github.com/tandermannhttps://github.com/tandermann Google Scholar profilehttps://scholar.google.com/citations?user=OxZM3SwAAAAJ&hl=en

On 9 Jan 2023, at 15:16, PierreHenriFabre @.***> wrote:

Hi, first of all thanks for this secapr pipeline. I am analysing a gene capture dataset of 60 specimen and 484 genes. I have an issu with loci alignments, when I run secapr_alignment the pipeline allways process 40 gene only despite the fact that the assembly and blast target previous step yield most of the genes. I install an alternative version of secapt (the one install with pip instead of conda installer). The second version does work for the aluigbement but do not work for the assembly and bwa steps, I was wondering if there is a way to modify this 40 genes limits and in which script I should look in order to fix this problem.

INFO: Note: NumExpr detected 40 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8. INFO: NumExpr defaulting to 8 threads. [WARNING] Output directory exists, REMOVE [Y/n]? Y Aligning sequence collections 39/40

Thanks for your help, I apologize if i made a mistake while running the pipeline.

— Reply to this email directly, view it on GitHubhttps://github.com/AntonelliLab/seqcap_processor/issues/35, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACRWKRPEP4FVMBZ2KWBTBPDWRQMVTANCNFSM6AAAAAATVPM6HU. You are receiving this because you are subscribed to this thread.Message ID: @.***>

VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.

När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/

E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy

— Reply to this email directly, [ https://github.com/AntonelliLab/seqcap_processor/issues/35#issuecomment-1385009549 | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/A5E7D2M7WG5UZQMB2F6SK43WSZJEZANCNFSM6AAAAAATVPM6HU | unsubscribe ] . You are receiving this because you authored the thread. Message ID: <AntonelliLab/seqcap_processor/issues/35/1385009549 @ github . com>

— Reply to this email directly, view it on GitHubhttps://github.com/AntonelliLab/seqcap_processor/issues/35#issuecomment-1385047803, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACRWKRI2SIMM6ID7LQVJSBTWSZNJNANCNFSM6AAAAAATVPM6HU. You are receiving this because you commented.

VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.

När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/

E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy

PierreHenriFabre commented 1 year ago

Dear Tobias,

Thanks for your help.I just install this version which work for the alignment step of my 484 exons. However this version provide me some message errore while I want to use the assembly module and the remappings modules.

For the assembly I got this message:

command: secapr assemble_reads --input /media/bigvol/phfabre/analyse_NGS/exon_capture/Thomasomys/secapr/Thomasomys_run1/reads --output contigs_test outpu: ################################################## Processing sample T_musm24300_clean De-novo assembly with spades of sample T_musm24300_clean: Building contigs........ T_musm24300_clean assembled. Statistics are printed into /media/bigvol/phfabre/analyse_NGS/exon_capture/Thomasomys/secapr/Thomasomys_run1/contigs_test/stats/T_musm24300_clean/T_musm24300_clean_spades_screen_out.txt cp: impossible d'évaluer '/media/bigvol/phfabre/analyse_NGS/exon_capture/Thomasomys/secapr/Thomasomys_run1/contigs_test/stats/T_musm24300_clean/contigs.fasta': Aucun fichier ou dossier de ce type Traceback (most recent call last): File "/usr/local/bin/secapr", line 11, in load_entry_point('secapr==0+unknown', 'console_scripts', 'secapr')() File "/usr/local/lib/python3.8/dist-packages/secapr-0+unknown-py3.8.egg/secapr/main.py", line 55, in main File "/usr/local/lib/python3.8/dist-packages/secapr-0+unknown-py3.8.egg/secapr/assemble_reads.py", line 195, in main File "/usr/local/lib/python3.8/dist-packages/secapr-0+unknown-py3.8.egg/secapr/assemble_reads.py", line 195, in File "/usr/local/lib/python3.8/dist-packages/secapr-0+unknown-py3.8.egg/secapr/assemble_reads.py", line 174, in process_subfolder File "/usr/local/lib/python3.8/dist-packages/secapr-0+unknown-py3.8.egg/secapr/assemble_reads.py", line 124, in get_stats_spades File "/usr/local/lib/python3.8/dist-packages/secapr-0+unknown-py3.8.egg/secapr/assemble_reads.py", line 108, in count_contigs FileNotFoundError: [Errno 2] No such file or directory: '/media/bigvol/phfabre/analyse_NGS/exon_capture/Thomasomys/secapr/Thomasomys_run1/contigs_test/stats/T_musm24300_clean/../../T_musm24300_clean.fa'

and the remapping I got this message:

command: secapr reference_assembly --reads reads --reference_type alignment-consensus --reference SECAPR_ALI_TRIM --output remapped_reads --min_coverage 4

Creating consensus sequences from input alignments... Done.

################################################## Processing sample T_incmusm43649_clean Traceback (most recent call last): File "/usr/local/bin/secapr", line 11, in load_entry_point('secapr==0+unknown', 'console_scripts', 'secapr')() File "/usr/local/lib/python3.8/dist-packages/secapr-0+unknown-py3.8.egg/secapr/main.py", line 55, in main File "/usr/local/lib/python3.8/dist-packages/secapr-0+unknown-py3.8.egg/secapr/reference_assembly.py", line 826, in main File "/usr/local/lib/python3.8/dist-packages/secapr-0+unknown-py3.8.egg/secapr/reference_assembly.py", line 271, in mapping_bwa IndexError: list index out of range

I hope I did not do a very stupid mistake with the installation or my inputs, the previous version I used was working for these 2 steps. I used python 3.8 to compile this version of secapr on ubuntu.

Thanks a lot for your help and time,

Pierre-Henri

De: "Tobias Andermann" @.> À: "AntonelliLab/seqcap_processor" @.> Cc: "Pierre-henri Fabre" @.>, "Author" @.> Envoyé: Mardi 17 Janvier 2023 10:41:51 Objet: Re: [AntonelliLab/seqcap_processor] Problem with loci alignments (Issue

35)

I see, that is an outdated version and has come bugs. Try instgalling the latest development veriosn by following the instructions at the end of the readme on github (https://github.com/AntonelliLab/seqcap_processor). Let me know in case you run into problems.

Best, Tobias


Tobias Andermann, PhD Assistant professor Data-Driven Life Sciences Fellowhttps://www.scilifelab.se/data-driven/fellows/

Department of Organismal Biology SciLifeLab Uppsala University Sweden

@.**@.> +46 76 090 1106 github.com/tandermannhttps://github.com/tandermann Google Scholar profilehttps://scholar.google.com/citations?user=OxZM3SwAAAAJ&hl=en

On 17 Jan 2023, at 09:57, PierreHenriFabre @.***> wrote:

Dear Tobias,

Thanks for your reply. secapr --version It provide me 2.1.0

I install the one from github with miniconda3. This version work for most of the modules appart alignment and plot for which I install the pip version (on a different image). I got some problems to run the one with pip installer for the modules.

Thanks for your help, it will be very helpful for me to have the good running version to analyze my exon capture data. Cheers Pierre-Henri

De: "Tobias Andermann" @.> À: "AntonelliLab/seqcap_processor" @.> Cc: "Pierre-henri Fabre" @.>, "Author" @.> Envoyé: Mardi 17 Janvier 2023 09:22:36 Objet: Re: [AntonelliLab/seqcap_processor] Problem with loci alignments (Issue

35)

Which version of secapr are you running (check secapr -v)?

Best, Tobias


Tobias Andermann, PhD Assistant professor Data-Driven Life Sciences Fellowhttps://www.scilifelab.se/data-driven/fellows/

Department of Organismal Biology SciLifeLab Uppsala University Sweden

@.**@.> +46 76 090 1106 github.com/tandermannhttp://github.com/tandermannhttps://github.com/tandermann Google Scholar profilehttps://scholar.google.com/citations?user=OxZM3SwAAAAJ&hl=en

On 9 Jan 2023, at 15:16, PierreHenriFabre @.***> wrote:

Hi, first of all thanks for this secapr pipeline. I am analysing a gene capture dataset of 60 specimen and 484 genes. I have an issu with loci alignments, when I run secapr_alignment the pipeline allways process 40 gene only despite the fact that the assembly and blast target previous step yield most of the genes. I install an alternative version of secapt (the one install with pip instead of conda installer). The second version does work for the aluigbement but do not work for the assembly and bwa steps, I was wondering if there is a way to modify this 40 genes limits and in which script I should look in order to fix this problem.

INFO: Note: NumExpr detected 40 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8. INFO: NumExpr defaulting to 8 threads. [WARNING] Output directory exists, REMOVE [Y/n]? Y Aligning sequence collections 39/40

Thanks for your help, I apologize if i made a mistake while running the pipeline.

— Reply to this email directly, view it on GitHubhttps://github.com/AntonelliLab/seqcap_processor/issues/35, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACRWKRPEP4FVMBZ2KWBTBPDWRQMVTANCNFSM6AAAAAATVPM6HU. You are receiving this because you are subscribed to this thread.Message ID: @.***>

VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.

När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/

E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy

— Reply to this email directly, [ https://github.com/AntonelliLab/seqcap_processor/issues/35#issuecomment-1385009549 | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/A5E7D2M7WG5UZQMB2F6SK43WSZJEZANCNFSM6AAAAAATVPM6HU | unsubscribe ] . You are receiving this because you authored the thread. Message ID: <AntonelliLab/seqcap_processor/issues/35/1385009549 @ github . com>

— Reply to this email directly, view it on GitHubhttps://github.com/AntonelliLab/seqcap_processor/issues/35#issuecomment-1385047803, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACRWKRI2SIMM6ID7LQVJSBTWSZNJNANCNFSM6AAAAAATVPM6HU. You are receiving this because you commented.

VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert. CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.

När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/

E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy

— Reply to this email directly, [ https://github.com/AntonelliLab/seqcap_processor/issues/35#issuecomment-1385106887 | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/A5E7D2KDTD5IM56BLJBUOCLWSZSN7ANCNFSM6AAAAAATVPM6HU | unsubscribe ] . You are receiving this because you authored the thread. Message ID: <AntonelliLab/seqcap_processor/issues/35/1385106887 @ github . com>

mlaize commented 1 year ago

Hello Pierre-Henri,

I have now the same exact problem using reference_assembly (after the same previous update to solve the align_sequence's bug). Did you get through it ? I would be grateful if there is any tip you can provide me. Also, secapr --version give me "secapr 0+unknown"

Regards,

Mathias