Oshlack / necklace

Combine reference and assembled transcriptomes for RNA-Seq analysis
https://github.com/Oshlack/necklace/wiki
GNU General Public License v3.0
21 stars 5 forks source link

de novo assembly fasta, reads_R2 error #13

Open stephanyfoster opened 4 years ago

stephanyfoster commented 4 years ago

hello,

I want to run necklace using de novo assembly fasta files, I am running necklace with the -p option and I have created the configuration file with the path to the de_novo_assembly_files and I am repeatedly getting an error: "A variable referred to in your script on line 27, 'reads_R2' was not defined."

How do I circumvent this?

nadiadavidson commented 4 years ago

Hi , You need to specify the short reads paths, reads_R1 and reads_R2, in the contig file, so it can perform genome-guided assembly and count reads mapping to genes (even if you've already done the de novo assembly). If your data is single-end you can just set: reads_R2="" If you are doing this and still get an error please send me your config file and the error output.

Cheers, Nadia.

stephanyfoster commented 4 years ago

thanks for your timely response.

Is it necessary to have the reads? Could I run necklace with "dummy" fastq's? I am having trouble finding the reads as the de novo assemblies were done several years ago...

nadiadavidson commented 4 years ago

You could try dummy fastqs although I suspect completely empty files might cause a few errors in Necklace, so best to provide some real or simulated reads from the same organism (but doesn't need to be many samples or high coverage).

stephanyfoster commented 4 years ago

I got a hold of the fastq files, and now I am getting a different error referring to my config file:

expecting EOF, found ','

that is referring to a comma present because I am including more than one de novo assembly file my script looks like this:

// de_novo_assembly_file="/sf6/xxx.BlastRef.fa,/sf6/xxx.BlastRef.fa,/sf6/ ....etc"

nadiadavidson commented 4 years ago

Hi,

Yes you can only pass one de novo assembly file to necklace, but a simple way to get around this is to join all the assemblies into one file. e.g. with "cat /sf6/*.BlastRef.fa > all_assemblies.fasta" This should work provided that none of the individual assemblies have the same contig ID as each other.

Cheers, Nadia.

stephanyfoster commented 4 years ago

hello-

I joined the assemblies and tried again. I am now getting errors like: A variable referred to in your script on line 3, 'sf6' was not defined. or A variable referred to in your script on line 3, 'all_assemblies' was not defined.

I am not sure why it is not recognizing the path or the name of the assembly file for what it is?

Thanks for your continued help

nadiadavidson commented 4 years ago

Hi, did you put quotes around the file name? Can you send the parameter you set?

stephanyfoster commented 4 years ago

I did put quotes.. here is the command I run: /users/sf6/data/necklace-1.11/tools/bin/bpipe run -p /sf6/data/necklace-1.11/necklace.groovy necklace.txt

and my config file: // sequencing data reads_R1="SRR_1.fastq" reads_R2="SRR_2.fastq”// de_novo_assembly_file="all_assemblies.fasta” //The genome and annotation genome=“GCA_000genomic.fa” //The genome and annotation of a related species genome=“GCF_000_genomic.fa” //

The file has no spaces because I also got error messages about new lines before.

nadiadavidson commented 4 years ago

Hi, Can you try uncommenting the de_novo_assembly_file part in your config file: // sequencing data reads_R1="SRR_1.fastq" reads_R2="SRR_2.fastq” de_novo_assembly_file="all_assemblies.fasta” //The genome and annotation genome=“GCA_000genomic.fa” //The genome and annotation of a related species genome=“GCF_000_genomic.fa” //

Then run the necklace command line this: /users/sf6/data/necklace-1.11/tools/bin/bpipe run /sf6/data/necklace-1.11/necklace.groovy necklace.txt The -p is only needed if you specify an argument after it. e.g. -p de_novo_assembly_file="all_assemblies.fasta”. I suspect this is what's causing you errors.

If you still get an error running with the suggestions above, you try adding the full path of the de novo assembly file.

Let me know if this helps.

Cheers, Nadia.

stephanyfoster commented 4 years ago

I've incorporated our suggestions, the error I receive now is: Could not understand command run /users/sf6/data/necklace-1.11/necklace.groovy or find it as a file

I've tried reinstalling necklace and this time within the sf6 directory and with the same command as before as well as shortening the path, I get the same error. Are you familiar with this?

thanks again for your help!!

nadiadavidson commented 4 years ago

No I've never seen an error like this before, justt o confirm, did you run the full command (including the bpipe at the start?)

/users/sf6/data/necklace-1.11/tools/bin/bpipe run /sf6/data/necklace-1.11/necklace.groovy necklace.txt

What happens if you just run: /users/sf6/data/necklace-1.11/tools/bin/bpipe Does it print usage information?

Another idea is you provide the full path to necklace.txt. You could also try providing the full path to the files inside your config file if you continue to get errors (although I don't think this is causing the current problem).

Hopefully we can work this out and get Necklace running. Thanks for your patience.

Cheers, Nadia.

stephanyfoster commented 4 years ago

Hi Nadia,

yes, I had run the full command. I re-installed the demo data and started from the beginning with everything in the same directory. I was able to successfully run Necklace on the demo data using ./necklace-1.11/tools...etc. I've tried the same now with my own config file. The new error is: WARN: Error evaluating script necklace.txt: No such property: assemblies for class: necklace

Pipeline Failed!

A variable referred to in your script on line 27, 'reads_R2' was not defined.

"assemblies" refers to the de novo assembly fasta's that I joined to "assemblies.fasta" Nothing has changed in my config file but I get the same error whether I comment or uncomment the de_novo_assembly_file part.

thankyou!

nadiadavidson commented 4 years ago

Did you put quotes around the filename, ie. "assemblies.fasta" in the config file? I just realised I had left these off in the wiki instructions, so apologies for that. Quotes are required and your error sounds a bit like they are missing.

stephanyfoster commented 4 years ago

I did put quotes around all of the file names. I've looked over my config file, I've retested. I don't get the warning about "no such properties". I do still get and error stating: A variable referred to in your script on line 27, 'reads_R2' was not defined.

For reference here is what is in my config file again and it is all on one line: // sequencing data reads_R1="SRR1138705_1.fastq" reads_R2="SRR1138705_2.fastq” //de_novo_assembly_file="assemblies.fa” //The genome and annotation genome=“GCAgenomic.fa” //The genome and annotation of a related species genome=“GCFgenomic.fa” //

nadiadavidson commented 4 years ago

Hmm, the variables all need to be on their own line in the config file (not one line). The "//" means it's a comment, so the code will interpret everything after as a comment and not process it as the location of all the files it needs. What was the reason again that you couldn't have everything on a separate line? What error do you get when you try that?

stephanyfoster commented 4 years ago

Here is the error I get: WARN: Error evaluating script necklace2.txt: startup failed: necklace2.txt: 3: expecting anything but ''\n''; got it anyway @ line 3, column 31. reads_R2="SRR1138705_2.fastq” ^

1 error

Pipeline Failed!

A variable referred to in your script on line 27, 'reads_R2' was not defined.

Now, my variables are on their own line, the file looks like this: // sequencing data reads_R1="SRR1138705_1.fastq" reads_R2="SRR1138705_2.fastq” // de_novo_assembly_file="assemblies.fa”

thanks again for taking the time to help with this!

nadiadavidson commented 4 years ago

That's strange, I haven't encountered an error like that before. Is that the full error that was printed? What command do you run? What system are you running on? Are you using bash shell? etc. It looks like it might be something to do with the environment to me.

stephanyfoster commented 4 years ago

I am using bash shell on a Mac. I wrote the config file using text edit and made it plain text. I will paste the command and error below:

Command: ./necklace-1.11/tools/bin/bpipe run ./necklace-1.11/necklace.groovy necklace2.txt Error: WARN: Error evaluating script necklace2.txt: startup failed: necklace2.txt: 3: expecting anything but ''\n''; got it anyway @ line 3, column 31. reads_R2="SRR1138705_2.fastq” ^

1 error

Pipeline Failed!

A variable referred to in your script on line 27, 'reads_R2' was not defined.

Please check that all pipeline stages or other variables you have referenced by this name are defined.

stephanyfoster commented 4 years ago

...unbelievable....I think the error was due to the quotations. In this line the last quotation mark is different from the first (reads_R2="SRR1138705_2.fastq”) and I'm not sure how I managed to do that but I was repeatedly getting an error there. So, I had a good laugh before the next error after fixing the quotes.

Error: Expected one or more inputs with extension 'gz' but none could be located from pipeline.

Should I zip my fastq files?

Thank you!

nadiadavidson commented 4 years ago

Wow... glad you finally found it. How frustrating. The next error is much simpler. Just gzip your fastq files and it should be happy.