EMBL-Hentze-group / DEWSeq

R/Bioconductor package for e/iCLIP data analysis
5 stars 1 forks source link

DESeqDataSetFromSlidingWindows issue #2

Closed connorrogerson closed 2 years ago

connorrogerson commented 2 years ago

When trying to generate DESeq object I get this error: ddw <- DESeqDataSetFromSlidingWindows(countData=count_matrix, annotObj = annotation_file, colData=col_data, design=~type)

Warning in DESeqDataSetFromSlidingWindows(countData = count_matrix, annotObj = annotation_file, : Cannot find chromosomal positions for all entries in countData. countData rows with missing annotation will be removed ! Error in DESeqDataSet(se, design = design, ignoreRank) : all samples have 0 counts for all genes. check the counting script.

head(count_matrix) gives me: smb-hk-2-22-fhevci1-rep1-20200123-ju_trimmed ENSG00000227232.5:intron0005W00067 1 ENSG00000227232.5:exon0004W00079 0 ENSG00000227232.5:intron0002W00089 2 ENSG00000227232.5:intron0001W00221 0 ENSG00000279457.4:intron0008W00009 0 ENSG00000279457.4:intron0007W00029 0 smb-hk-2-23-fhevci8-rep1-20200123-ju_trimmed ENSG00000227232.5:intron0005W00067 0 ENSG00000227232.5:exon0004W00079 0 ENSG00000227232.5:intron0002W00089 1 ENSG00000227232.5:intron0001W00221 0 ENSG00000279457.4:intron0008W00009 0 ENSG00000279457.4:intron0007W00029 1

I thought the chromosomal positions came from the annotation object rather than the matrix file?

sudeepsahadevan commented 2 years ago

That count_matrix looks odd. Does it have only one column of counts ? Could you please list down the steps how you got your count_matrix ? It is easy to diagnose the issue once we know in detail the exact steps you have done

connorrogerson commented 2 years ago

It looks OK on Rstudio? It looks like this:

image

My steps were:

  1. Extraction of x-links htseq-clip extract -i /rds/user/cjr78/hpc-work/iCLIP/bam/$INFILE -o /rds/user/cjr78/hpc-work/iCLIP/htseq-clip/extract_xlink/$INFILE.xlink.bed -e 1 -s s -g -1 -c $SLURM_NTASKS
  2. Counts xlinks htseq-clip count -i /rds/user/cjr78/hpc-work/iCLIP/htseq-clip/extract_xlink/$INFILE.xlink.bed -a /rds/user/cjr78/hpc-work/iCLIP/htseq-clip/annotation/SLBP_w100s50.txt.gz -o /rds/user/cjr78/hpc-work/iCLIP/htseq-clip/counts/$INFILE.csv
  3. Combine into count file htseq-clip createMatrix -i counts/ -b smb -o counts/iCLIP_HK2_SLBP_w50s20_counts.txt
sudeepsahadevan commented 2 years ago

Thanks, now we have a better idea. Now I suspect that it might be column class issue. Could you please crosscheck the class of all columns and make sure that they are integers ? If we avoid the warning for now, the error comes from DESeq2 because the RangedSummarizedExperiment object have all 0 values. I suspect this might be because the columns are not all integer or numeric

connorrogerson commented 2 years ago

OK, checking the class returns integer for all columns

sudeepsahadevan commented 2 years ago

hmm in this case, please share a subset of your count_matrix and annotation_file where you can reproduce the error with us, we need to take a closer look to diagnose the issue.

connorrogerson commented 2 years ago

SLBP_w100s50_annotation.subset.txt.gz iCLIP_HK2_SLBP_w100s50_counts.txt

I've attached the two files and I can reproduce the error on my R. Let me know if you need anything else.

sudeepsahadevan commented 2 years ago

There is something wrong with the SLBP_w100s50_annotation.subset.txt.gz file, the first column unique_id is supposed to be unique (!) and one per row, but in your case, they repeat. Please send us the details on the genome annotation that you used and your python environment. If this appears to be an htseq-clip issue, I will close this issue here and open a new one in htseq-clip repo.

connorrogerson commented 2 years ago

OK! I downloaded the genome annotation as per the protocol I was following (https://link.springer.com/protocol/10.1007%2F978-1-0716-1851-6_10).

wget ftp:// ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_27/gencode.v27.annotation.gff3.gz

I then just followed these steps: htseq-clip annotation -g gencode.v27.annotation.gff3 -o gencode.v27.annotation.bed htseq-clip createSlidingWindows -i gencode.v27.annotation.bed -w 100 -s 50 -o SLBP_w100s50.txt htseq-clip mapToId -a SLBP_w100s50.txt -o SLBP_w100s50_annotation.txt.gz

sudeepsahadevan commented 2 years ago

Closing this issue in this repo as it is an htseq-clip issue. Opened a new issue in htseq-clip repo