WGLab / VirTect

Detection of viruses from RNA-Seq on human samples
45 stars 13 forks source link

Syntax errors in script #11

Open KevinMaroney opened 1 year ago

KevinMaroney commented 1 year ago

Hey, I was asked to check for viral reads in our bulk RNA-seq data, but when I ran your script through your detailed tutorial, I kept getting errors with the "print" syntax throughout the script. Unsure if this was always there or not, but just letting you know in case others are running into the issue but haven't paid attention to the error messages:

I was able to fix the following in VirTect.py:

  1. replace all instances of [print 'Running '] with [print("Running")]
  2. On line 194, replace [print '\'] with [print('\t')]
  3. On line 195, replace (again print) [print line.strip()] with [print(line.strip())]

I then got the error "no GTF file!" so I assumed one was not downloaded. However, I found that according to your documentation the it expects "gencode.v25.chr_patch_hapl_scaff.annotation.gtf" but what I got after downloading/indexing as according to your code was "gencode.v29.annotation.gtf.gz" and when I tried running the first error I ran into was as follows, indicating that there is something wrong with the gtf file downloaded/indexed by your code:

[2023-06-29 14:22:13] Building transcriptome data files /data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/Virtect/tmp/gencode.v29.annotation.gtf [FAILED] Error: gtf_to_fasta returned an error. Running samtools sort -n /data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/Virtect/unmapped.bam -o /data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/Virtect/unmapped_sorted.bam [E::hts_open_format] Failed to open file "/data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/Virtect/unmapped.bam" : No such file or directory samtools sort: can't open "/data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/Virtect/unmapped.bam": No such file or directory Running bedtools bamtofastq -i /data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/Virtect/unmapped_sorted.bam -fq /data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/Virtect/unmapped_sorted_1.fq -fq2 /data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/Virtect/unmapped_sorted_2.fq [E::hts_open_format_impl] Failed to open file /data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/Virtect/unmapped_sorted.bam Failed to open BAM file /data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/Virtect/unmapped_sorted.bam Running bwa mem /home/kmaroney/programs/VirTect/viruses_reference/viruses_759.fasta /data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/Virtect/unmapped_sorted_1.fq /data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/Virtect/unmapped_sorted_2.fq > /data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/Virtect/unmapped_aln.sam [M::bwa_idx_load_from_disk] read 0 ALT contigs [main] Version: 0.7.17-r1188 [main] CMD: bwa mem /home/kmaroney/programs/VirTect/viruses_reference/viruses_759.fasta /data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/Virtect/unmapped_sorted_1.fq /data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/Virtect/unmapped_sorted_2.fq [main] Real time: 0.138 sec; CPU: 0.011 sec Running samtools view -Sb -h /data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/Virtect/unmapped_aln.sam > /data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/Virtect/unmapped_aln.bam Running samtools view /data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/Virtect/unmapped_aln.bam | cut -f3 | sort | uniq -c | awk '{if ($1>=400) print $0}' > /data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/Virtect/unmapped_viruses_count.txt awk: cmd. line:1: { if ($2!=(ploc+1)) {if (ploc!=0){printf("%s %d-%d awk: cmd. line:1: ^ unterminated string awk: cmd. line:1: { if ($2!=(ploc+1)) {if (ploc!=0){printf("%s %d-%d awk: cmd. line:1: ^ syntax error The continous length ----------------------------------------Note: There is no real virus in the sample :)----------------------------

However, I used a GTF file and genome I previously indexed and it seems (stuck on preparing reads step rather than simply failing quickly) to be working. So I think that the code itself is totally fine. Just a couple syntax errors and problem with the human reference. I just wanted to put this here to be helpful. If anyone's having issues, should be able to solve with this. Looking forward to getting my viral reads :)