Closed nikolasthuesen closed 4 years ago
Hi Nikolas,
Try running the script without the --in-dir option. There may be something in the interplay with your -1, -2 options and that --in-dir option.
Thanks, Chris
Hi Chris Thanks for the reply. I've run the script in the same fashion for 20 samples, and it works for the other 19. I think this is triggered by an oddity of the sample, but I can try running it without the --in-dir tomorrow. Nikolas
Hi Nikolas,
I see. In that case no need to run it without the --in-dir option. I agree with you that it may be something odd in the sample. It may be a read naming issue, sequence issue, or reporting issue. The following command is what HISAT-genotype uses to align the reads for extraction. The resulting SAM file should be useful in helping determining the root of your error.
hisat2 -p 10 --no-spliced-alignment -X 1000 -x /home/projects/cu_10148/people/nikthu/hisatgenotype_test/indicies/genotype_genome -1 /home/projects/cu_10148/people/nikthu/data/1000G_ten_benchmark/HG01341.bam_reads1.fastq -2 /home/projects/cu_10148/people/nikthu/data/1000G_ten_benchmark/HG01341.bam_reads2.fastq
I think I have that formatted to your specific use case so you should, hopefully, be able to run it directly by copy-and-paste (note it will print to your console without a pipe '>' out). Let me know what you find. Hope this helps!
Thanks, Chris
Hi Chris
Sorry for the late reply. I have now investigated the case further, and found that the caue of the error was a couple of unpaired reads in my fastq files. The error therefore was not in hisat2 or HISAT-genotype, but in my data conversion. Sorry for the inconvenience. The only thing, which then could be improved on your tools in relation to this case is maybe a warning to the user, when the input fastq files have faults, instead of the current behaviour, where the tool gives a prediction on a very limited number of reads.
Thanks Nikolas
Hi Nikolas,
Thanks for the update! That's exactly my thought. I'll make a note to add an error check in the next version. Thanks!
Thanks, Chris
https://github.com/DaehwanKimLab/hisat-genotype/blob/cf91052c5a6c96aa4804e57e946cca1856935822/hisatgenotype_modules/hisatgenotype_typing_process.py#L1648
When running hisatgenotype on the sample HG01341 I get the following error as seen below:
I tried printing read2 in each iteration and found, that in the iteration with the error, it was an empty list.
The same was by the way true for read1.
Hisatgenotype then continued with the reads, which were extracted before the error and returned the faulty result: