Roleren / ORFik

MIT License
32 stars 9 forks source link

validateExperiments(df): experiment table has non-unique rows! #162

Closed SuhasSrinivasan closed 8 months ago

SuhasSrinivasan commented 8 months ago

Hello again, sorry for reporting so many issues. Maybe this error is an edge case.

Help would be much appreciated!

Full error

Error in validateExperiments(df): experiment table has non-unique rows! update either replicate, stage, condition or fraction,to get non unique rows!
create.experiment(file.path(conf["bam Ribo-seq"], "aligned/"),
                  exper = conf["exp Ribo-seq"],
                  fa = annotation["genome"],
                  txdb = paste0(annotation["gtf"], ".db"),
                  organism = organism,
                  pairedEndBam = paired.end.rfp,
                  rep = c(1,2,3,1,1,2,3,1))

There are no duplicate rows and the issue persists even after providing values for replicate, stage and condition. Initially stage and condition were partially filled by the automated parsing.

"name","ribo_crispr_Ribo-seq","","","",""
"gff","/Bio_data/references/hs_grch38/Homo_sapiens.GRCh38.111._ensembl.gtf.db","","","organism","Homo sapiens"
"fasta","/Bio_data/references/hs_grch38/Homo_sapiens.GRCh38.dna.primary_assembly.fa","","","",""
"libtype","stage","rep","condition","fraction","filepath"
"RFP","Cardio","1","Cardio","Cardio","/Bio_data/processed_data/Ribo-seq/ribo_crispr/aligned/Cardio_RFP_1_Aligned.sortedByCoord.out.bam"
"RFP","Cardio","2","Cardio","Cardio","/Bio_data/processed_data/Ribo-seq/ribo_crispr/aligned/Cardio_RFP_2_Aligned.sortedByCoord.out.bam"
"RFP","Cardio","3","Cardio","Cardio","/Bio_data/processed_data/Ribo-seq/ribo_crispr/aligned/Cardio_RFP_3_Aligned.sortedByCoord.out.bam"
"RFP","Cardio","1","Cardio","Cardio","/Bio_data/processed_data/Ribo-seq/ribo_crispr/aligned/Cardio_ribo_har_Aligned.sortedByCoord.out.bam"
"RFP","iPSC","1","iPSC","iPSC","/Bio_data/processed_data/Ribo-seq/ribo_crispr/aligned/iPSC_RFP_1_Aligned.sortedByCoord.out.bam"
"RFP","iPSC","2","iPSC","iPSC","/Bio_data/processed_data/Ribo-seq/ribo_crispr/aligned/iPSC_RFP_2_Aligned.sortedByCoord.out.bam"
"RFP","iPSC","3","iPSC","iPSC","/Bio_data/processed_data/Ribo-seq/ribo_crispr/aligned/iPSC_RFP_3_Aligned.sortedByCoord.out.bam"
"RFP","iPSC","1","iPSC","iPSC","/Bio_data/processed_data/Ribo-seq/ribo_crispr/aligned/iPSC_ribo_har_Aligned.sortedByCoord.out.bam"

PROBLEMATIC WORKAROUND Issue can only be resolved if rep column has continuous values 1,2,3,4,1,2,3,4 but this does not reflect the actual replicate numbers. The fourth sample in Cardio and iPSC are Harringtonine treated.


P.S. Will be great to know which column is causing the issue. Also, suggested change in error message

Error in validateExperiments(df): Experiment table has non-unique rows! Update either replicate, stage, condition or fraction, to get unique rows!
SuhasSrinivasan commented 8 months ago

POTENTIAL WORKAROUND Setting a different condition for the sample with only one replicate works and is better than using contiguous values for rep.

   libtype  stage rep condition
1:     RFP Cardio   1       CHX
2:     RFP Cardio   2       CHX
3:     RFP Cardio   3       CHX
4:     RFP Cardio   1      Harr
5:     RFP   iPSC   1       CHX
6:     RFP   iPSC   2       CHX
7:     RFP   iPSC   3       CHX
8:     RFP   iPSC   1      Harr
Roleren commented 8 months ago

Great, will take a look at these today/monday, but for now, your "workaround" is the way I intend it to be made. Harr is a feature, so it should be included to make the experimental design unique.

Depending on your goal, the best thing is always to put the major factor of interest into condition column. So set stage as empty, condition as Cardio/iPSC and put treatment into fraction column. That is what I usually do at least :)

Roleren commented 8 months ago

There is now a much better error text for this, I also added some more tests for valid experiments. Pushed to master branch on github.

Open a new issue if there is something else