Closed LoneKnightz closed 1 year ago
Hi,
Yes, count files and annotation files must be on the same order and have the same rows.
I would recommend using or following this Rmd:
R Block Sample Sanity contains a check.
Hope this helps. If you have any questions, let us know.
Tom Schwarzl
LoneKnightz @.***> schrieb am So., 4. Dez. 2022, 00:07:
Hi,
I followed the htseq-clip + Dewseq pipeline to process clip data.
But it gave erroe when use DESeqDataSetFromSlidingWindows:
Error in SummarizedExperiment(assays = SimpleList(counts = countData), : the rownames and colnames of the supplied assay(s) must be NULL or identical to those of the RangedSummarizedExperiment object (or derivative) to construct
Is this because countData (1462654 3) dimension not match annotationData (65312564 12)?
I get all count/ matrix files and sliding windowed annotation files from HTseq-clip.
Is there any way to fix this ?
Thanks
— Reply to this email directly, view it on GitHub https://github.com/EMBL-Hentze-group/DEWSeq/issues/7, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG2FZF4T25A5QVZHE3C643WLPHDDANCNFSM6AAAAAASS7RFU4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hi,
Thanks for confirm.
In this case, may I ask dou you know how should I change the Htseq-clip command I used to genarate count/matrix that have same denmision as annotation files? These are the command I used:
htseq-clip annotation -g /work/prh169/work/genome/GRCm39/gencode/gencode.vM31.chr_patch_hapl_scaff.annotation.gff3 -o /work/prh169/work/genome/GRCm39/gencode/gencode.vM31.chr_patch_hapl_scaff.annotation.bed htseq-clip createSlidingWindows -i /work/prh169/work/genome/GRCm39/gencode/gencode.vM31.chr_patch_hapl_scaff.annotation.bed -o /work/prh169/work/genome/GRCm39/gencode/gencode.vM31.chr_patch_hapl_scaff.annotation.bed.window htseq-clip mapToId -a /work/prh169/work/genome/GRCm39/gencode/gencode.vM31.chr_patch_hapl_scaff.annotation.bed.window -o /work/prh169/work/genome/GRCm39/gencode/gencode.vM31.chr_patch_hapl_scaff.annotation.bed.name
htseq-clip extract -c 75 -i ../bam_gencode/${filename}Aligned.sortedByCoord.out.bam -e 1 -o ../crosslink_sites/${filename}crosslink.sites.bed htseq-clip count -i ../crosslink_sites/${filename}crosslink.sites.bed -a /work/prh169/work/genome/GRCm39/gencode/gencode.vM31.chr_patch_hapl_scaff.annotation.bed.window -o ../count/${filename}count.txt
Thanks
Hi LoneKnightz
Could you please post the first ~10-15 lines of your countData
, annotObj
files and the entire colData
data.frame ? This looks like a SummarizedExperiment Error i.e, colnames(CountData)
and rownames(colData)
being inconsistent.
HI sudeepsahadevan, This is the countData
countData_MM[1:15,] unique_id MM-IP_S60_count MM-Input_S61_count 1: ENSMUSG00000064842.3:exon0001W00003 4 6 2: ENSMUSG00000064842.3:exon0001W00004 4 6 3: ENSMUSG00000051951.6:intron0003W00229 1 0 4: ENSMUSG00000051951.6:intron0003W00230 1 0 5: ENSMUSG00000051951.6:intron0003W00231 1 0 6: ENSMUSG00000051951.6:intron0001W17342 2 0 7: ENSMUSG00000051951.6:intron0001W17343 2 0 8: ENSMUSG00000051951.6:intron0001W17344 2 0 9: ENSMUSG00000102343.2:intron0003W01626 1 0 10: ENSMUSG00000102343.2:intron0003W01627 1 0 11: ENSMUSG00000102343.2:intron0003W01628 1 0 12: ENSMUSG00000025900.14:intron0026W05652 1 0 13: ENSMUSG00000025900.14:intron0026W05653 1 0 14: ENSMUSG00000025900.14:intron0014W12048 1 0 15: ENSMUSG00000025900.14:intron0014W12049 1 0
This is the annotationData
annotationData[1:15,] unique_id chromosome begin end strand gene_id gene_name gene_type 1: ENSMUSG00000102693.2:exon0001W00001 chr1 3143475 3143525 + ENSMUSG00000102693.2 4933401J01Rik TEC 2: ENSMUSG00000102693.2:exon0001W00002 chr1 3143495 3143545 + ENSMUSG00000102693.2 4933401J01Rik TEC 3: ENSMUSG00000102693.2:exon0001W00003 chr1 3143515 3143565 + ENSMUSG00000102693.2 4933401J01Rik TEC 4: ENSMUSG00000102693.2:exon0001W00004 chr1 3143535 3143585 + ENSMUSG00000102693.2 4933401J01Rik TEC 5: ENSMUSG00000102693.2:exon0001W00005 chr1 3143555 3143605 + ENSMUSG00000102693.2 4933401J01Rik TEC 6: ENSMUSG00000102693.2:exon0001W00006 chr1 3143575 3143625 + ENSMUSG00000102693.2 4933401J01Rik TEC 7: ENSMUSG00000102693.2:exon0001W00007 chr1 3143595 3143645 + ENSMUSG00000102693.2 4933401J01Rik TEC 8: ENSMUSG00000102693.2:exon0001W00008 chr1 3143615 3143665 + ENSMUSG00000102693.2 4933401J01Rik TEC 9: ENSMUSG00000102693.2:exon0001W00009 chr1 3143635 3143685 + ENSMUSG00000102693.2 4933401J01Rik TEC 10: ENSMUSG00000102693.2:exon0001W00010 chr1 3143655 3143705 + ENSMUSG00000102693.2 4933401J01Rik TEC 11: ENSMUSG00000102693.2:exon0001W00011 chr1 3143675 3143725 + ENSMUSG00000102693.2 4933401J01Rik TEC 12: ENSMUSG00000102693.2:exon0001W00012 chr1 3143695 3143745 + ENSMUSG00000102693.2 4933401J01Rik TEC 13: ENSMUSG00000102693.2:exon0001W00013 chr1 3143715 3143765 + ENSMUSG00000102693.2 4933401J01Rik TEC 14: ENSMUSG00000102693.2:exon0001W00014 chr1 3143735 3143785 + ENSMUSG00000102693.2 4933401J01Rik TEC 15: ENSMUSG00000102693.2:exon0001W00015 chr1 3143755 3143805 + ENSMUSG00000102693.2 4933401J01Rik TEC gene_region Nr_of_region Total_nr_of_region window_number 1: exon 1 1 1 2: exon 1 1 2 3: exon 1 1 3 4: exon 1 1 4 5: exon 1 1 5 6: exon 1 1 6 7: exon 1 1 7 8: exon 1 1 8 9: exon 1 1 9 10: exon 1 1 10 11: exon 1 1 11 12: exon 1 1 12 13: exon 1 1 13 14: exon 1 1 14 15: exon 1 1 15
This is coldata
colData type MM-IP_S60_count IP MM-Input_S61_count Input
Thanks
Hi LoneKnightz,
Thanks for the data. Please be aware that 1 IP vs 1 Input comparison is not optimal for DEWSeq, and that you may have reproducibility issues.
What output do you get for class(countData_MM)
? Is it
[1] "data.frame"
OR
[1] "tbl_df" "tbl" "data.frame"
?
In the first case, please use the column unique_id
in countData_MM
as rownames(countData_MM)
and remove this column from the data.frame
.
In the second case, use the option tidy=TRUE
for DESeqDataSetFromSlidingWindows
as described in the DEWSeq manual
Hi LoneKnightz,
Unfortunately, DEWSeq (the underlying DESeq2) cannot handle 1 vs 1 comparisons, because one cannot estimate variance and therefore not really perform meaningful statistics. Explorative analysis can be done when you use DESeq2 (create a DESeqDataSet object), filter low count windows, then do VST transformation (varianceStabilizingTransformation) on it and then calculating IP - SMI (which is equivalent to plotting IP vs SMI sample and looking for deviations from the 45 degree axis). In your case, you will be only interested in values > 0 because you look for enrichments in IP.
Alternatively, and probably the better solution for this would be to use RCRUNCH (https://www.biorxiv.org/content/10.1101/2022.07.06.498949v1) or after you inspect your data and see beautiful peaks, use PUREclip or CLIPper which do 1 vs 1 comparisons.
Sorry for the inconvenience, we will make sure to highlight the minimal requirements in the vignette.
If you need any more help, please let us know. Tom
On Tue, Dec 6, 2022 at 8:23 AM Sudeep Sahadevan @.***> wrote:
Hi LoneKnightz https://github.com/LoneKnightz,
Thanks for the data. Please be aware that 1 IP vs 1 Input comparison is not optimal for DEWSeq, and that you may have reproducibility issues.
What output do you get for class(countData_MM) ? Is it [1] "data.frame" OR [1] "tbl_df" "tbl" "data.frame" ?
In the first case, please use the column unique_id in countData_MM as rownames(countData_MM) and remove this column from the data.frame.
In the second case, use the option tidy=TRUE as described in the DEWSeq manual https://bioconductor.org/packages/release/bioc/manuals/DEWSeq/man/DEWSeq.pdf
— Reply to this email directly, view it on GitHub https://github.com/EMBL-Hentze-group/DEWSeq/issues/7#issuecomment-1338897753, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG2FZHQ5CIKCRR44IGYXI3WL3SX5ANCNFSM6AAAAAASS7RFU4 . You are receiving this because you commented.Message ID: @.***>
Hi Distue
Thanks for reply.
I know that DESeq2 can only handle case with replicates. The result I am having is the prelimary result and they want to test some idea on this. If the result is good, they may sequence more as replicates.
In this way, I have to make sure this pipeline is working so it will be easy for future use.
Thanks for remind. I will try RCRUNCH or the tools you mentioned.
Hi sudeepsahadevan The output of countData_MM is
class(countData_MM) [1] "data.table" "data.frame"
I use as.data.frame convert countData_MM to data.frame and use unique_id as rownames (and delete unique_id column). but there is new error :
Error in `.rowNamesDF<-`(x, value = value) : duplicate ‘row.names‘ are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘0’, ‘1’, ‘10’, ‘100’, ‘101’, ‘102’, ‘103’, ‘10321’, ‘104’, ‘105’, ‘106’, ‘10601’, ‘107’, ‘10738’, ‘109’, ‘11’, ‘110’, ‘111’, ‘111188’, ‘112’, ‘113’, ‘114’, ‘11505’, ‘11525’, ‘11592’, ‘116’, ‘117’, ‘118’, ‘119’, ‘12’, ‘120’, ‘12060’, ‘121’, ‘12176’, ‘122’, ‘123’, ‘123414’, ‘125’, ‘127’, ‘128’, ‘1287’, ‘13’, ‘130’, ‘131’, ‘132’, ‘133’, ‘134’, ‘135’, ‘137’, ‘1399’, ‘14’, ‘143’, ‘144’, ‘145’, ‘146’, ‘148’, ‘149’, ‘15’, ‘150’, ‘1502’, ‘151’, ‘1520’, ‘153’, ‘154’, ‘1566’, ‘157’, ‘159’, ‘16’, ‘160’, ‘162’, ‘16241’, ‘163’, ‘1649’, ‘166’, ‘167’, ‘17’, ‘170’, ‘173’, ‘17300’, ‘174’, ‘1750’, ‘1754’, ‘176’, ‘18’, ‘184’, [... truncated]
If I convert annotationData rownames to unique_id and delete unique_id column , the errors are the same.
I feel like this mismatch caused by Htseq-clip count. Htseq-clip count may not have row for samples have no count in this slide windowed annotation . So that caused size diff between annotation and count files.
Thanks
Hi,
I followed the htseq-clip + Dewseq pipeline to process clip data.
But it gave erroe when use DESeqDataSetFromSlidingWindows:
Error in SummarizedExperiment(assays = SimpleList(counts = countData), : the rownames and colnames of the supplied assay(s) must be NULL or identical to those of the RangedSummarizedExperiment object (or derivative) to construct
Is this because countData (1462654 3) dimension not match annotationData (65312564 12)?
I get all count/ matrix files and sliding windowed annotation files from HTseq-clip.
Is there any way to fix this ?
Thanks