Closed pwl closed 6 years ago
Perhaps related: the refFlat file generated by running the meta
target turned out to be empty, so I updated my generate_meta.smk
to fix that. Perhaps that's the source of the issue?
Yes, if you look at the sample1_final.txt in the headers, all your reads are tagged as XF:Z:INTERGENIC. Those are not counted as mapped to genes.
I'm pretty sure this is related to the modification of the refFlat file. Which genome and annotation files are you using?
I'm using the first gtf file from GENCODE, the "Comprehensive gene annotation" one for "CHR" regions.
EDIT: and for the genome I'm using the "Transcript sequences" from the same page.
Hi @pwl I think your issue comes from the transcripts.fa You should use Genome sequence (GRCh38.p10). I'm actually puzzled as to how it didn't throw an error (mine did) when using the transcripts.fa. Using the genome sequence, the refflat is not empty. Can you try again?
I'll try it, thanks. I also just realized that there are reference files (including .fa
and .gtf
files) on the drop-seq webpage in the "Data resources" section, I just haven't found them before and tried to use ones from GENCODE.
Do you have a mixed experiment? Mouse/human or only human? I'm using those DNA and gene sets GTF file
I tried it with the genome sequence GRCh38.p10
and it worked perfectly! Thank you!
I run the whole pipeline (
meta qc filter map extract
) on one sample with 4M reads. Most of the pipeline run without any issues and then an R script threw an error near the end of the analysis. When I started debugging the scripts I found that all of the summary files are empty (apart from their headers).You can find the reports here. What's surprising is that the results from qc look reasonable. Also, I'm no expert on bam files but they all seem pretty reasonable to me (I added the contents of the first 100 lines of the intermediate bam files converted to plain text with
samtools view
into thebamheads
directory). You can find the full logs from running the pipeline instderr.log
andstdout.log
. Perhaps I missed some option or specified a wrong threshold?I used a standard pair of gtf/fasta files downloaded straight from GENCODE.