Open pbpayal opened 6 years ago
@lpantano, any ideas here?
Hi,
it should work with BAM files, I think the problem is the parsing the GTF.
It is your BAM files with the same chromosome naming than the GTF? because all files empties are weird. At least you need to get the *tsv file with something.
After that we can debug on why the transcript is not working.
Can you check that?
We are mainly running this code:
samtools sort -n -O sam {fn_in} -o /dev/stdout | awk '$7==\"=\"' | htseq-count -s no -i gene - {gtf} > {out}
that should give something with values.
Cheers
But I used the test files that I downloaded from this repo only..and I checked both the bam and gtf file have chr annotation!
could it be because "htseq-count -s no -i gene - {gtf} > {out}" doesn't have the sam/bam file input in the command?
yeah, it would be something like this:
samtools sort -n -O sam Test1.bam -o /dev/stdout | awk '$7=="="' | htseq-count -s no -i gene - ref-transcripts.gtf > sample.tsv
I would try to replicate tomorrow the issue, sorry about this.
ok, I see the issue.
@eweitz can you remember me what is the input file we need for the idiogram? it has to be entrez symbol ID and the expression?
@lpantano, the step in our pipeline after counter.py
is formatter.py
, which takes an input file like SRR562645_counts_norm.tsv
produced by counter.py
. I believe that TSV file contains gene symbol (e.g. BRCA1) and expression.
The formatter.py
script then outputs a JSON file containing custom-formatted annotations, which is the input for Ideogram.js.
Can I use this tool on local bam files for my project? I haven't submitted my data to SRA yet!
python counter.py --inp test/Test1.bam --out Test1_counts --gtf test/ref-transcripts.gtf
Error: Using this file annotation test/ref-transcripts.gtf samtools sort -n -O sam test/Test1.bam -o /dev/stdout | awk '$7=="="' | htseq-count -s no -i gene - test/ref-transcripts.gtf > Test1_counts.tsv Traceback (most recent call last): File "counter.py", line 150, in
out_fn = normalize(out_fn, gtf)
File "counter.py", line 68, in normalize
size = _get_size(gtf)
File "counter.py", line 32, in _get_size
transcript_id = feature.attr['ID']
KeyError: 'ID'
I know its mentioned "GTF needs to have ID and gene in the attributes field.", but what do you mean by that? I tried replacing the code:
The program runs, but only gives empty output files!!