biomedicalinformaticsgroup / Sargasso

Sargasso disambiguates mixed-species high-throughput sequencing data.
http://biomedicalinformaticsgroup.github.io/Sargasso/
Other
8 stars 4 forks source link

Error with the parameter test script #108

Closed katieemelianova closed 1 year ago

katieemelianova commented 1 year ago

Hello!

I am having a go at running the sargasso_parameter_test script but I'm running into an error which I can't seem to work out.

My command:

sargasso_parameter_test rnaseq  --samples-origin 'SRR3330397' --mismatch-setting '0 2 4' --minmatch-setting '0 2 4' --multimap-setting '1' --plot-format png incarnata_sample.tsv incarnata_test incarnata /scratch/botany/katie/orchid/ParentalRNAseq/incarnata/incarnata_db fuchsii /scratch/botany/katie/orchid/ParentalRNAseq/fuchsii/fuchsii_db

my sample tsv file:

SRR3330397      /scratch/botany/katie/orchid/ParentalRNAseq/incarnata/SRR3330397_1.fastq        /scratch/botany/katie/orchid/ParentalRNAseq/incarnata/SRR3330397_2.fastq

I'm getting the error:

Error: number of sample does not equal to number of sample origin.

I've tried to match the names of the sample everywhere I think, but no luck. Do you know what could be the problem?

Thank you! :)

Katie

lweasel commented 1 year ago

Hi Katie,

@hxin knows this script a lot better than I do, but one thing it looks like to me is that I think that the --samples-origin parameter should be a space-separated list, equal to the number of samples, of the species that the samples contain, rather than the sample name itself – so, because you've only got the one sample in your TSV file, it should just be the name of the species ("incarnata" or "fuchsii") that SRR3330397 contains. Could you possibly give that a go and see what happens?

katieemelianova commented 1 year ago

Hiya,

I gave it a go using two ways of supplying the species name:

sargasso_parameter_test rnaseq  --samples-origin incarnata --mismatch-setting '0 2 4' --minmatch-setting '0 2 4' --multimap-setting '1' --plot-format png incarnata_sample.tsv incarnata_test incarnata /scratch/botany/katie/orchid/ParentalRNAseq/incarnata/incarnata_db fuchsii /scratch/botany/katie/orchid/ParentalRNAseq/fuchsii/fuchsii_db

and

(sargasso) [emelianova@login01 parameter_test]$ sargasso_parameter_test rnaseq  --samples-origin "incarnata" --mismatch-setting '0 2 4' --minmatch-setting '0 2 4' --multimap-setting '1' --plot-format png incarnata_sample.tsv incarnata_test incarnata /scratch/botany/katie/orchid/ParentalRNAseq/incarnata/incarnata_db fuchsii /scratch/botany/katie/orchid/ParentalRNAseq/fuchsii/fuchsii_db

And both still give me the error:

Error: number of sample does not equal to number of sample origin.

I will carry on trying to figure it out but if you have any other ideas I can try those too :)

Best,

Katie

lweasel commented 1 year ago

Ok, that is weird - as far as I can tell from having a look at the code, that check is just counting the number of items in the "samples-origin" list, and checking that it's equal to the number of lines in the sample TSV. Is there any chance that there's anything like an extra empty line in the sample TSV file?

katieemelianova commented 1 year ago

No, there was one before and I thought I had cracked it but no luck unfortunately! The tsv file looks like this with non newlines after the first one:

SRR3330397      /scratch/botany/katie/orchid/ParentalRNAseq/incarnata/SRR3330397_1.fastq        /scratch/botany/katie/orchid/ParentalRNAseq/incarnata/SRR3330397_2.fastq
lweasel commented 1 year ago

Am a bit baffled by this :-) . Two things to try:

1) Could you try replacing the tabs in your samples TSV file with spaces?

2) If that doesn't work, what do the following commands output?

SAMPLES=`cut -d ' ' -f -1 incarnata_sample.tsv | paste -d " " -s`
echo "${SAMPLES}" | awk -F' ' '{print NF}'
katieemelianova commented 1 year ago

Aha! It was the spaces that did it! thanks so much for helping with that, sorry, my bad for incorrectly formatting the file! :D

lweasel commented 1 year ago

Not your fault, it's not unreasonable to use tabs in a "TSV" file 😂. Weirdly using tabs or spaces in the samples file works for us here, so there must be something different about your execution environment, I guess?. I assume it's a Linux machine that you're running on? What version of bash does it have?

katieemelianova commented 1 year ago

Yep linux! Not sure if its the info you were looking for but this is what I think is the bash version:

 echo "${BASH_VERSION}"
4.4.20(1)-release
lweasel commented 1 year ago

Hmm, exactly the same version as here! So, I don't know why it wasn't working for you with tabs - but anyway, since we've managed to find a workaround to make it work, I'll go ahead and close this?