Trinity mutation pipeline error

annaquaglieri16 commented 8 years ago

Hello there, I am trying to run the rnaseq_mutation_pipeline.py using the instructions provided in the Readme.txt. However, I keep getting an error regarding my input .fa files. Could you provide an example or be more clear about what you mean with "2 paired RNASeq samples ( fasta or fasta.qz)".

Do you mean 2 different samples (biological replicates?) paired-end RNA-seq samples? I am running it in this way

python ~/software/Trinity_CTAT-master/mutation/rnaseq_mutation_pipeline.py \ --reference ~/hg19.fa \ --vcf snp.vcf \ --left ~/sample1_1.fa ~/sample2_1.fa \ --right ~/sample1_2.fa ~/sample2_2.fa \ .....

And I keep getting the same error: rnaseq_mutation_pipeline.py: error: unrecognized arguments: ~/sample2_1.fa ~/sample2_2.fa

I have tried to run it with only sample1 but I get the error

Traceback (most recent call last): File "/home/users/allstaff/quaglieri.a/software/Trinity_CTAT-master/mutation/rnaseq_mutation_pipeline.py", line 1625, in run( args ) File "/home/users/allstaff/quaglieri.a/software/Trinity_CTAT-master/mutation/rnaseq_mutation_pipeline.py", line 1244, in run pline_cur = pline_cur, f_index_only = f_do_index ) File "/home/users/allstaff/quaglieri.a/software/Trinity_CTAT-master/mutation/rnaseq_mutation_pipeline.py", line 150, in func_do_star_alignment lcmd_commands.extend( [ Command.Command( str_cur_command = " ".join( [ "cd", str_align_dir_1 ] ) ), TypeError: init() takes exactly 4 arguments (2 given)

Do you have any suggestion? Anna

TimothyTickle commented 8 years ago

Hello Anna,

Thank you so much for the feed back, this will help us make the ReadMe.txt clearer. The meaning that is intended is that it is expected there are paired fasta or fastq files for a sample; two fasta or fastq files for a sample. Often these are also referred to as R1 and R2; we put R1 in the left and R2 in the right but this is just convention. If you have two samples (4 fasta or 4 fastq files), you will need to call the program twice (once for each pair).

More specifically, if in your example if ~/sample1_1.fa and ~/sample1_2.fa are paired from the same sample sample1, and likewise ~/sample2_1.fa and ~/sample2_2.fa for sample2, the commands would be:

python ~/software/Trinity_CTAT-master/mutation/rnaseq_mutation_pipeline.py \ --reference ~/hg19.fa \ --vcf snp.vcf \ --left ~/sample1_1.fa \ --right ~/sample1_2.fa \ .....

and

python ~/software/Trinity_CTAT-master/mutation/rnaseq_mutation_pipeline.py \ --reference ~/hg19.fa \ --vcf snp.vcf \ --left ~/sample2_1.fa \ --right ~/sample2_2.fa \ .....

Please let me know if there are other points to clarify.

Extra tips:

We tend to use the absolute path and not the tilde/home in sample file names. If you have trouble with this (or not) please let me know. It will be interesting to get feedback on this use.
You are also welcome to use our publicly available and free to use install of this software https://galaxy.ncgas-trinity.indiana.edu.
We typically use this tool with fastq files but the underlying STAR aligner is documented to be able to use either.

Cordially,

Tim

annaquaglieri16 commented 8 years ago

Hi Tim, thanks a lot for your reply.

Unfortunately, I had tried to run it in the way you explained to me. This is my actual run:

python /home/users/allstaff/quaglieri.a/software/Trinity_CTAT-master/mutation/rnaseq_mutation_pipeline.py \ --reference /home/users/allstaff/quaglieri.a/PHD_project/GEO_Leucegene_data/genomes/hg19/hg19.fa \ --vcf /home/users/allstaff/quaglieri.a/PHD_project/GEO_Leucegene_data/genomes/hg19/hg19_vcf/dbsnp_138.hg19.excluding_sites_after_129.vcf \ --left /home/users/allstaff/quaglieri.a/PHD_project/GEO_Leucegene_data/sra/fastq/SRX729608_combined_1.fa \ --right /home/users/allstaff/quaglieri.a/PHD_project/GEO_Leucegene_data/sra/fastq/SRX729608_combined_2.fa \ --threads 15 \ --cosmic_vcf /home/users/allstaff/quaglieri.a/PHD_project/GEO_Leucegene_data/genomes/hg19/hg19_vcf/CosmicCodingMuts.vcf \ --radar /home/users/allstaff/quaglieri.a/PHD_project/GEO_Leucegene_data/genomes/hg19/hg19_vcf/database/Human_AG_all_hg19_v2.txt \ --darned /home/users/allstaff/quaglieri.a/PHD_project/GEO_Leucegene_data/genomes/hg19/hg19_vcf/darned_hg19.txt \ --log /home/users/allstaff/quaglieri.a/PHD_project/GEO_Leucegene_data/Trinity_mutation/log_trinity_try.txt \ --tissue_type "blood" \ --email quaglieri.a@wehi.edu.au \ --cravat_annotation_header /home/users/allstaff/quaglieri.a/software/Trinity_CTAT-master/mutation/headers/cravat_annotation.txt \ --is_hg19 \ --out_dir /home/users/allstaff/quaglieri.a/PHD_project/GEO_Leucegene_data/Trinity_mutation/

And the error message is

Traceback (most recent call last): File "/home/users/allstaff/quaglieri.a/software/Trinity_CTAT-master/mutation/rnaseq_mutation_pipeline.py", line 1625, in run( args ) File "/home/users/allstaff/quaglieri.a/software/Trinity_CTAT-master/mutation/rnaseq_mutation_pipeline.py", line 1244, in run pline_cur = pline_cur, f_index_only = f_do_index ) File "/home/users/allstaff/quaglieri.a/software/Trinity_CTAT-master/mutation/rnaseq_mutation_pipeline.py", line 150, in func_do_star_alignment lcmd_commands.extend( [ Command.Command( str_cur_command = " ".join( [ "cd", str_align_dir_1 ] ) ), TypeError: init() takes exactly 4 arguments (2 given)

I am not sure exactly which 4 arguments it needs, but it seems to me that it gives an error when it should start to run STAR.

Please let me know if you can help me with this error. I would really want to use it since I was building the same exact pipeline when I came across it and I would like to compare it with my results.

All the best,

Anna

TimothyTickle commented 8 years ago

Hello Anna,

Let's check the version of the underlying job running engine, SciEDPipeR. There are a couple commits that are tested with Trinity_CTAT.

On command line (in a terminal) go to the directory you cloned SciEDPiper. Try the command:

git log -1.

This should give you the last commit at the very top of the page (type q to exist out). We are currently using the stable v.0.1.5 or commit 64c11c61504b78939c5278178fa58568f9ec85de.

Could you confirm that matches your commit at the top of the page.

If not please type the command:

git checkout 64c11c61504b78939c5278178fa58568f9ec85de

and try again.

You are welcome to email me directly for faster feedback.

Cordially,

Tim ttickle@broadinstitute.org

annaquaglieri16 commented 8 years ago

Hi Tim,

thanks a lot for your help! That was the problem, now the program has been running for a while so it should be fine. When I gpt the git clone the commit was different from the one you wrote below but after updating it it seems to work!

Thank you again, All the best, Anna

----- Original Message -----

From: "Timothy Tickle" notifications@github.com To: "NCIP/Trinity_CTAT" Trinity_CTAT@noreply.github.com Cc: "annaquaglieri16" quaglieri.a@wehi.edu.au, "Author" author@noreply.github.com Sent: Saturday, 23 April, 2016 2:15:09 AM Subject: Re: [NCIP/Trinity_CTAT] Trinity mutation pipeline error (#1)

Hello Anna,

Let's check the version of the underlying job running engine, SciEDPipeR. There are a couple commits that are tested with Trinity_CTAT.

On command line (in a terminal) go to the directory you cloned SciEDPiper. Try the command:

git log -1.

This should give you the last commit at the very top of the page (type q to exist out). We are currently using the stable v.0.1.5 or commit 64c11c61504b78939c5278178fa58568f9ec85de.

Could you confirm that matches your commit at the top of the page.

If not please type the command:

git checkout 64c11c61504b78939c5278178fa58568f9ec85de

and try again.

You are welcome to email me directly for faster feedback.

Cordially,

Tim ttickle@broadinstitute.org

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub

Anna Quaglieri, PhD student Division of Bioinformatics - Speed lab, Walter & Eliza Hall Institute of Medical Research 1G Royal Parade, Parkville VIC 3052, Australia

Contact: +614 6892 5003

The information in this email is confidential and intended solely for the addressee. You must not disclose, forward, print or use it without the permission of the sender.

annaquaglieri16 commented 8 years ago

Hi Tim,

I am still in the process of running your pipeline. I had to add a few changes to your code since it was giving me errors, like:

--filter_mismatching_base_and_quals in the split&trim step
-csv instead of -plots at the AnalyzeCovariate step cause it was giving me problems loading some R packages
remove -recoverDanglingHeads from HaplotypeCaller cause it is deprecated
And now I am stuck at another point but I am not sure how to fix it I attached a snapshot of the error message. it has something to do with these lines of code I reckon. However, I have never done it before so I am not too sure about your filtering options.

str_filter_command = " ".join( [ "/usr/local/bioinfsoftware/gatk/GenomeAnalysisTK-3.5.0/gatk -T VariantFiltration -R", args_call.str_genome_fa, "-V", str_variants_file, "-window 35", "-cluster 3 -filterName FS -filter \"FS > 30.0\" -filterName QD","-filter \"QD < 2.0\" --out" , str_filtered_variants_file ] ) cmd_variant_filteration = Command.Command( str_cur_command = str_filter_command, lstr_cur_dependencies = [ args_call.str_genome_fa ] + lstr_dependencies, lstr_cur_products = [ str_filtered_variants_file ] )

Do you have any suggestion?

Thank you,

Anna

----- Original Message -----

From: "Anna Quaglieri" quaglieri.a@wehi.edu.au To: "NCIP/Trinity_CTAT" reply@reply.github.com Cc: "NCIP/Trinity_CTAT" Trinity_CTAT@noreply.github.com, "Author" author@noreply.github.com Sent: Tuesday, 26 April, 2016 1:12:58 PM Subject: Re: [NCIP/Trinity_CTAT] Trinity mutation pipeline error (#1)

Hi Tim,

thanks a lot for your help! That was the problem, now the program has been running for a while so it should be fine. When I gpt the git clone the commit was different from the one you wrote below but after updating it it seems to work!

Thank you again, All the best, Anna

----- Original Message -----

From: "Timothy Tickle" < notifications@github.com > To: "NCIP/ Trinity _CTAT" < Trinity _CTAT@noreply.github.com> Cc: "annaquaglieri16" < quaglieri.a@wehi.edu.au >, "Author" < author@noreply.github.com > Sent: Saturday, 23 April, 2016 2:15:09 AM Subject: Re: [NCIP/ Trinity _CTAT] Trinity mutation pipeline error (#1)

Hello Anna,

Let's check the version of the underlying job running engine, SciEDPipeR. There are a couple commits that are tested with Trinity _CTAT.

On command line (in a terminal) go to the directory you cloned SciEDPiper. Try the command:

git log -1.

This should give you the last commit at the very top of the page (type q to exist out). We are currently using the stable v.0.1.5 or commit 64c11c61504b78939c5278178fa58568f9ec85de.

Could you confirm that matches your commit at the top of the page.

If not please type the command:

git checkout 64c11c61504b78939c5278178fa58568f9ec85de

and try again.

You are welcome to email me directly for faster feedback.

Cordially,

Tim ttickle@broadinstitute.org

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub

Anna Quaglieri, PhD student Division of Bioinformatics - Speed lab, Walter & Eliza Hall Institute of Medical Research 1G Royal Parade, Parkville VIC 3052, Australia

Contact: +614 6892 5003

Anna Quaglieri, PhD student Division of Bioinformatics - Speed lab, Walter & Eliza Hall Institute of Medical Research 1G Royal Parade, Parkville VIC 3052, Australia

Contact: +614 6892 5003

The information in this email is confidential and intended solely for the addressee. You must not disclose, forward, print or use it without the permission of the sender.

TimothyTickle commented 8 years ago

Hello Anna,

It sounds like several of these edits are generated from the environment the pipeline is being ran in. I am guessing some of the edits to the calls to the GATK tools stem from potentially not having the supported version of the GATK (found in the READMe.txt). Are you using the supported version or are you adapting the code to a newer GATK? GATK frequently updates itself, we plan to catch up to the newest version in the near future. The errors with R would require you to install those packages in your R environment, --plot will stop secondary plots from being generated which reduces the burden of needed R packages.

I am not seeing the snapshot you mentioned or an error message, could you try sending that again? I see a snip of code but no error message.

Cordially,

Tim

TimothyTickle commented 8 years ago

Hello Anna,

Are you wanting to run the darned/radar, and cosmic steps? If so the resources should match your reference genome (in nomenclature, just wanted to give you a heads up).

Cordially,

Tim

annaquaglieri16 commented 8 years ago

Hi Tim,

I am actually using a more recent version of GATK GenomeAnalysisTK-3.5.0 while in your README.txt the GenomeAnalysisTK-3.1-1-g is specified. I didn't downloaded myselft, its simply the default on our servers. I overcome the "plot" problem by simply outputting the csv which is ok.

I thought I had attached the snapshot to the previous email. I'll put it again.

Thank you a lot! All the best, Anna

----- Original Message -----

From: "Timothy Tickle" notifications@github.com To: "NCIP/Trinity_CTAT" Trinity_CTAT@noreply.github.com Cc: "annaquaglieri16" quaglieri.a@wehi.edu.au, "Author" author@noreply.github.com Sent: Tuesday, 3 May, 2016 7:06:56 AM Subject: Re: [NCIP/Trinity_CTAT] Trinity mutation pipeline error (#1)

Hello Anna,

It sounds like several of these edits are generated from the environment the pipeline is being ran in. I am guessing some of the edits to the calls to the GATK tools stem from potentially not having the supported version of the GATK (found in the READMe.txt). Are you using the supported version or are you adapting the code to a newer GATK? GATK frequently updates itself, we plan to catch up to the newest version in the near future. The errors with R would require you to install those packages in your R environment, --plot will stop secondary plots from being generated which reduces the burden of needed R packages.

I am not seeing the snapshot you mentioned or an error message, could you try sending that again? I see a snip of code but no error message.

Cordially,

Tim

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub

Anna Quaglieri, PhD student Division of Bioinformatics - Speed lab, Walter & Eliza Hall Institute of Medical Research 1G Royal Parade, Parkville VIC 3052, Australia

Contact: +614 6892 5003

The information in this email is confidential and intended solely for the addressee. You must not disclose, forward, print or use it without the permission of the sender.

annaquaglieri16 commented 8 years ago

Hi Tim,

I am actually using a more recent version of GATK GenomeAnalysisTK-3.5.0 while in your README.txt the GenomeAnalysisTK-3.1-1-g is specified. I didn't downloaded myselft, its simply the default on our servers. I overcome the "plot" problem by simply outputting the csv which is ok.

I thought I had attached the snapshot to the previous email. I'll put it again.

Thank you a lot! All the best,

Anna

----- Original Message -----

From: "Timothy Tickle" notifications@github.com To: "NCIP/Trinity_CTAT" Trinity_CTAT@noreply.github.com Cc: "annaquaglieri16" quaglieri.a@wehi.edu.au, "Author" author@noreply.github.com Sent: Tuesday, 3 May, 2016 7:06:56 AM Subject: Re: [NCIP/Trinity_CTAT] Trinity mutation pipeline error (#1)

Hello Anna,

It sounds like several of these edits are generated from the environment the pipeline is being ran in. I am guessing some of the edits to the calls to the GATK tools stem from potentially not having the supported version of the GATK (found in the READMe.txt). Are you using the supported version or are you adapting the code to a newer GATK? GATK frequently updates itself, we plan to catch up to the newest version in the near future. The errors with R would require you to install those packages in your R environment, --plot will stop secondary plots from being generated which reduces the burden of needed R packages.

I am not seeing the snapshot you mentioned or an error message, could you try sending that again? I see a snip of code but no error message.

Cordially,

Tim

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub

Anna Quaglieri, PhD student Division of Bioinformatics - Speed lab, Walter & Eliza Hall Institute of Medical Research 1G Royal Parade, Parkville VIC 3052, Australia

Contact: +614 6892 5003

The information in this email is confidential and intended solely for the addressee. You must not disclose, forward, print or use it without the permission of the sender.

TimothyTickle commented 8 years ago

Hello Anna,

I am not seeing the attachment on Github, if I am mistaken please point it out so I can look at the error message. If you also are not seeing the attachment, you are welcome to send it to my email at ttickle@broadinstitute.org .

Cordially,

Tim

annaquaglieri16 commented 8 years ago

Sorry for that, can you see it now?

All the best, Anna

----- Original Message -----

From: "Timothy Tickle" notifications@github.com To: "NCIP/Trinity_CTAT" Trinity_CTAT@noreply.github.com Cc: "annaquaglieri16" quaglieri.a@wehi.edu.au, "Author" author@noreply.github.com Sent: Wednesday, 4 May, 2016 12:20:44 AM Subject: Re: [NCIP/Trinity_CTAT] Trinity mutation pipeline error (#1)

Hello Anna,

I am not seeing the attachment on Github, if I am mistaken please point it out so I can look at the error message. If you also are not seeing the attachment, you are welcome to send it to my email at ttickle@broadinstitute.org .

Cordially,

Tim

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub

Anna Quaglieri, PhD student Division of Bioinformatics - Speed lab, Walter & Eliza Hall Institute of Medical Research 1G Royal Parade, Parkville VIC 3052, Australia

Contact: +614 6892 5003

The information in this email is confidential and intended solely for the addressee. You must not disclose, forward, print or use it without the permission of the sender.

TimothyTickle commented 8 years ago

Hello Anna,

I was able to see the attachment in the email that was sent to me. This is an error from the underlying GATK tools and is due to the use of an unsupported version of the GATK with the pipeline. Some of the commands will have to be updated to the 3.5 version that you decided to use. Although the full tool error was cropped off at the top (so you will have to confirm this), I believe the tool that is not compatible is the VariantFiltration tool. The logs will tell you the last command that was issued.

Cordially,

Tim

annaquaglieri16 commented 8 years ago

Hi Tim, sorry for my delay! Thank you for your answer and as soon as I can I'll go back to try make it work cause I'm really curious to use your pipeline!

All the best, Anna

TrinityCTAT / Trinity_CTAT

Trinity mutation pipeline error #1