Closed connorrogerson closed 2 years ago
That count_matrix
looks odd. Does it have only one column of counts ? Could you please list down the steps how you got your count_matrix
? It is easy to diagnose the issue once we know in detail the exact steps you have done
It looks OK on Rstudio? It looks like this:
My steps were:
htseq-clip extract -i /rds/user/cjr78/hpc-work/iCLIP/bam/$INFILE -o /rds/user/cjr78/hpc-work/iCLIP/htseq-clip/extract_xlink/$INFILE.xlink.bed -e 1 -s s -g -1 -c $SLURM_NTASKS
htseq-clip count -i /rds/user/cjr78/hpc-work/iCLIP/htseq-clip/extract_xlink/$INFILE.xlink.bed -a /rds/user/cjr78/hpc-work/iCLIP/htseq-clip/annotation/SLBP_w100s50.txt.gz -o /rds/user/cjr78/hpc-work/iCLIP/htseq-clip/counts/$INFILE.csv
htseq-clip createMatrix -i counts/ -b smb -o counts/iCLIP_HK2_SLBP_w50s20_counts.txt
Thanks, now we have a better idea. Now I suspect that it might be column class issue. Could you please crosscheck the class
of all columns and make sure that they are integers ?
If we avoid the warning for now, the error comes from DESeq2 because the RangedSummarizedExperiment
object have all 0 values. I suspect this might be because the columns are not all integer
or numeric
OK, checking the class
returns integer
for all columns
hmm in this case, please share a subset of your count_matrix
and annotation_file
where you can reproduce the error with us, we need to take a closer look to diagnose the issue.
SLBP_w100s50_annotation.subset.txt.gz iCLIP_HK2_SLBP_w100s50_counts.txt
I've attached the two files and I can reproduce the error on my R. Let me know if you need anything else.
There is something wrong with the SLBP_w100s50_annotation.subset.txt.gz
file, the first column unique_id
is supposed to be unique (!) and one per row, but in your case, they repeat. Please send us the details on the genome annotation that you used and your python environment. If this appears to be an htseq-clip issue, I will close this issue here and open a new one in htseq-clip repo.
OK! I downloaded the genome annotation as per the protocol I was following (https://link.springer.com/protocol/10.1007%2F978-1-0716-1851-6_10).
wget ftp:// ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_27/gencode.v27.annotation.gff3.gz
I then just followed these steps:
htseq-clip annotation -g gencode.v27.annotation.gff3 -o gencode.v27.annotation.bed
htseq-clip createSlidingWindows -i gencode.v27.annotation.bed -w 100 -s 50 -o SLBP_w100s50.txt
htseq-clip mapToId -a SLBP_w100s50.txt -o SLBP_w100s50_annotation.txt.gz
Closing this issue in this repo as it is an htseq-clip issue. Opened a new issue in htseq-clip repo
When trying to generate DESeq object I get this error:
ddw <- DESeqDataSetFromSlidingWindows(countData=count_matrix, annotObj = annotation_file, colData=col_data, design=~type)
Warning in DESeqDataSetFromSlidingWindows(countData = count_matrix, annotObj = annotation_file, :
Cannot find chromosomal positions for all entries in countData.
countData rows with missing annotation will be removed !
Error in DESeqDataSet(se, design = design, ignoreRank) :
all samples have 0 counts for all genes. check the counting script.
head(count_matrix)
gives me:smb-hk-2-22-fhevci1-rep1-20200123-ju_trimmed
ENSG00000227232.5:intron0005W00067 1
ENSG00000227232.5:exon0004W00079 0
ENSG00000227232.5:intron0002W00089 2
ENSG00000227232.5:intron0001W00221 0
ENSG00000279457.4:intron0008W00009 0
ENSG00000279457.4:intron0007W00029 0
smb-hk-2-23-fhevci8-rep1-20200123-ju_trimmed
ENSG00000227232.5:intron0005W00067 0
ENSG00000227232.5:exon0004W00079 0
ENSG00000227232.5:intron0002W00089 1
ENSG00000227232.5:intron0001W00221 0
ENSG00000279457.4:intron0008W00009 0
ENSG00000279457.4:intron0007W00029 1
I thought the chromosomal positions came from the annotation object rather than the matrix file?