bartongroup / RATS

Relative Abundance of Transcripts: An R package for the detection of Differential Transcript isoform Usage.
MIT License
32 stars 1 forks source link

Error parsing kallisto abundance.h5 files #67

Closed Angad-33 closed 2 years ago

Angad-33 commented 3 years ago

Hi there,

I used following code to parse the kallisto generated files. But end up with following error. Please suggest the reason.

mydata <- fish4rodents(A_paths= sample_TM, B_paths= sample_TL, annot= t2g, half_cooked=TRUE)

Error in h5checktypeOrOpenLoc(file, readonly = TRUE, fapl = NULL, native = native) : Error in h5checktypeOrOpenLoc(). Cannot open file. File 'NA/abundance.h5' does not exist.

fruce-ki commented 3 years ago

Hello Angad.

From the looks of "NA/abundance.h5" I'd say at least one of your paths was created from undefined (NA) values. Check the contents of sample_TM and sample_TL. They must be valid paths from your working directory.

Angad-33 commented 3 years ago

Hello Fruce,

Thanks for you quick response. I figured out the issue, it was with unequal number of sample files across the two conditions. I means there were 10 files in sample_TM and 8 samples in sample_TL.

Best Angad

fruce-ki commented 3 years ago

I am happy it is working again for you.

csoeder commented 2 years ago

I'd been struggling with this and was about to open an issue. Does the algorithm actually require the same number of replicates in both conditions? In the case I'm currently working on this is problematic, since we had to drop one replicate for QC reasons.

fruce-ki commented 2 years ago

Hi, thanks for getting in touch. It's been a while since I worked with isoform-level transcriptomic data, so I cannot answer this off the top of my head, and I don't have any suitable datasets laying about for a quick try. I will schedule it in for a closer look.Is it reasonably feasible for you to test it with a symmetric number of samples?Can you share what commands you used? What Kallisto version?And what leads you to the conclusion that the error you encountered is due to unequal number of replicates?Cheers!

csoeder commented 2 years ago

Hi! I've already tested it on balanced experiments and it runs without error. I used kallisto v0.44.0:

> print(control_list)
[1] "quantification/kallisto/annotated/dm6_genes/7D_GROUPHOUSE_1"
[2] "quantification/kallisto/annotated/dm6_genes/7D_GROUPHOUSE_2"
[3] "quantification/kallisto/annotated/dm6_genes/7D_GROUPHOUSE_3"
> print(treatment_list)
[1] "quantification/kallisto/annotated/dm6_genes/FRULEXAFRU440_2"
[2] "quantification/kallisto/annotated/dm6_genes/FRULEXAFRU440_3"

mydata <- fish4rodents(A_paths= control_list, B_paths= treatment_list, annot= dmelLookup.formal.df, scaleto=100000000)
> mydata <- fish4rodents(A_paths= control_list, B_paths= treatment_list, annot= dmelLookup.formal.df, scaleto=100000000)
[1] "Skipping conversion: abundance.h5 already in  quantification/kallisto/annotated/dm6_genes/7D_GROUPHOUSE_1"
[...]
[1] "Skipping conversion: abundance.h5 already in  quantification/kallisto/annotated/dm6_genes/FRULEXAFRU440_3"
Error in h5checktypeOrOpenLoc(file, readonly = TRUE, fapl = NULL, native = native) : 
  Error in h5checktypeOrOpenLoc(). Cannot open file. File 'NA/abundance.h5' does not exist.

I examined the source code in R/input.R and worked through it step by step; from what I could tell, the problem occurs around lines ~80-90:

  # Load and convert.
  res <- lapply(c('A', 'B'), function(cond) {
    boots_A <- mclapply(1:lA, function(x) {
      # Get the correct files and scaling factors.
      if (cond=='A') {
        fil <- A_paths[x]
        sf <- sfA[x]
      } else {
        fil <- B_paths[x]
        sf <- sfB[x]
      }

This loop iterates along the x_th element of the path list, but it does so for x = 1 to len(A_path), for both A_path and B_path. When there are fewer B samples than A, it returns an NA, which gets treated as a file and raises an error when it can't be found. The control structure of this snippet looks a little odd to me and there's an earlier symmetric use of list lengths lA and lB, so I wonder if this is really what it's supposed to do?

fruce-ki commented 2 years ago

Thanks for the info! It seems quite conclusive that there is an issue with imbalanced sets that is caused by how the import wrapper is written. There was no intention for asymmetric sets not to work, quite the contrary, the test suite uses asymmetric data. But the import wrapper was an afterthought.

fruce-ki commented 2 years ago

Hello again @csoeder and sorry it took me so long.

I finally had some time to look into this. I appears the error is something that I already fixed at some point in the development branch of the package, but it never made it into the master as I was planning more things for the next update that I never got around to work on.

Can I ask you please to try using the development branch and let me know if the fix works for you? If it does, I should probably release the update regardless of my original plans.