bartongroup / RATS

Relative Abundance of Transcripts: An R package for the detection of Differential Transcript isoform Usage.
MIT License
32 stars 1 forks source link

fish4rodents() input problem with data.table 1.11.0 #55

Closed fifdick closed 6 years ago

fifdick commented 6 years ago

Hey, I have been using RATs recently, mainly with salmon output making use of the fish4rodents function. It always worked well and I have been working on the same data the whole time (since ~ 1week). Suddenly I cannot reproduce my results or rather I cannot reproduce because I get an error during the execution of fish4rodents, using the same data using the same code. The only change that might have happened is that with installing a new bioconductor package I updated all other libraries... But I don't see how this could have influenced the reading of salmon input (as there was no update of wasabi or rats)

The Code fish4rodents(case.dir.list,control.dir.list,annot,scaleto=libsize)

(first to vars are path lists, annot colnames are as recommended in the vignettes, libsize is a vector of length= length(case.dir.list)+length(control.dir.list))

The Error:

Error in as.data.table.array(h5read(file.path(fil, "abundance.h5"), "/aux/ids")) :
as.data.table.array method should be only called for arrays with 3+ dimensions, for 2 dimensions matrix method should be used

The error appears after each sample data has already been sucessfully converted as the output tells me.

Tue May 8 11:05:53 2018
[1] "Successfully converted sailfish / salmon results in salmon_withBootstrap/1234 to kallisto HDF5 format"

I checked the code of the function and also went to check the wasabi code of the prepareFishForSleuth() functions. No error comes up during the call of this function. It seems to go all well until line 104 in RATS/R/input.R ids <- as.data.table( h5read(file.path(fil, "abundance.h5"), "/aux/ids") )

I unfortunately have no knowledge about h5 files and their structure but it seems that they were created in a wrong way during conversion even though no error was caught before?

I attached my session info.
I hope provided all necessary information, if not please let me know.

Thanks in advance

sessionInfo.txt

fruce-ki commented 6 years ago

I also recently had some issues parsing .h5 files. kallisto .h5 output was imported full of NULL, but I did not have issues with salmon+wasabi .h5 files.

Which Salmon version are you using? Latest 0.9.1? Did you by any chance update any R packages just before the problem appeared (or any other tools)?

RATs has not changed, so whatever this new problem is must have been caused by a change in one of the other involved tools/packages, either wasabi or rhdf5. Wasabi has not been updated in 2 years, so my money is on rhdf5 being the cause. I will try to figure out what has happened, but in the meantime rolling back to an older version of rhdf5 may temporarily resolve the problem (and let me know if you try it).

fifdick commented 6 years ago

Which Salmon version are you using? Latest 0.9.1?

"salmon_version": "0.9.1"

I did update R packages, I installed biomart from bioconductor and let it update all dependencies. I cant make out what exactly made the difference. There should have been no change in wasabi or rhdf5 because there have been no updates in the last week..?! Im not sure how to "undo" the update as I dont know which library is causing the problem. But I will try to find out. Thanks for investigating. Looking forward to the result.

fruce-ki commented 6 years ago

I have the same versions as you. An update wouldn't need to have been issued in the last week, just installed around then. But the new Bioconductor version did come out just over a week ago and it does have a newer version of rhdf5. And then there's its dependencies, and their dependencies, etc...

I may be in deep over my head in this and it may take a while to resolve.

Here's a workaround idea:

kallisto provides an h5dump subcommand. For me that worked when fish4rodents() didn't. It creates a directory full of plaintext files. It is a bit tedious, but you could make a loop parsing these tables and collating the relevant columns into a list of tables, completely bypassing fish4rodents(). That is what the function essentially does anyway, collate selected columns from multiple tables.

fifdick commented 6 years ago

kallisto provides an h5dump subcommand. For me that worked when fish4rodents() didn't. It creates a directory full of plaintext files. It is a bit tedious, but you could make a loop parsing these tables and collating the relevant columns into a list of tables, completely bypassing fish4rodents(). That is what the function essentially does anyway, collate selected columns from multiple tables.

And then do the scaling during call_dtu()?

Another info, I dont know if it helps though: The error basicially tells that the dimensions of the object are less than 2 and thus it cannot be converted to a data.table.

ids <- as.data.table( h5read(file.path(fil, "abundance.h5"), "/aux/ids") ) I tried to use h5read function on one of the salmon converted data: a<-h5read("<path>/case1234/abundance.h5","/aux/ids") what I get is:

typeof(a)
[1] "character"
mode(a)
[1] "character"
class(a) [1] "array" dim(a) [1] 195013

and

195013

is the number of transcripts....weirdly. or should it be like that? The content of a seem to be the transcript ids. as.data.table(as.character(a)) works fine of corse... so does: as.data.table( h5read(file.path(fil, "abundance.h5"), "/bootstrap") )

(I also tried simply installing an older version of rhdf5 (rhdf5_2.21.1.1), it did not change things)

fruce-ki commented 6 years ago

Yes, you can scale during call_dtu().

I don't remember off the top of my head how abundance.h5 is structured. But aux.ids is about IDs, so I'd expect the number of rows to be the same as the number of transcripts indeed.

fifdick commented 6 years ago

I worked around the issue by just editing the mentioned lines and wrapped as.character() around the hd5read function. It worked. I’m now running dtu_call() hoping not to run into another error (that could be the result of my workaround). I will keep you updated.

Dr. Kimon Froussios notifications@github.com schrieb am Di. 8. Mai 2018 um 20:25:

Yes, you can scale during call_dtu().

I don't remember off the top of my head how abundance.h5 is structured. But aux.ids is about IDs, so I'd expect the number of rows to be the same as the number of transcripts indeed.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bartongroup/RATS/issues/55#issuecomment-387497031, or mute the thread https://github.com/notifications/unsubscribe-auth/ANJNKTD7HUNs80KqGsDQrg7xhwctFuyTks5tweMjgaJpZM4T2Tsq .

-- Von Gmail Mobile gesendet

fruce-ki commented 6 years ago

Well, I'm out for today. Will get back on this tomorrow.

fruce-ki commented 6 years ago

Following your lead about the as.data.table() bit being the issue. I've tried it on 3 systems. The two ones (mac, linux) with data.table 1.10.4 cast to data.table without issue. The third one (centos) has data.table 1.11.0 and that one crashes with:

Error in as.data.table.array(h5read("./salmon_quant/Hs_GRCh37.67.1/abundance.h5",  : 
  as.data.table.array method should be only called for arrays with 3+ dimensions, for 2 dimensions matrix method should be used

So it seems the source of the problem is data.table. The wrong type of as.data.table() is being invoked by the generic as.data.table() call. Not sure if bug or deliberate feature... The object returned by h5read() self-identifies as an array, which is why the array type of as.data.table() gets called. However it is just a one-dimensional array (a vector really) and data.table has become fussy about that.

If that's ALL that's going on, it's an easy fix.

fifdick commented 6 years ago

yes exactly. Sounds logic to me! Thanks for investigating. In the mean time to being able to go on with my analysis I just temporarily fixed it by converting the "array" to a char vector and with that the import worked and also the following dtu_call().

2018-05-09 13:03 GMT+02:00 Dr. Kimon Froussios notifications@github.com:

Following your lead about the as.data.table() bit being the issue. I've tried it on 3 systems. The two ones (mac, linux) with data.table 1.10.4 cast to data.table without issue. The third one (centos) has data.table 1.11.0 and that one crashes with:

Error in as.data.table.array(h5read("./salmon_quant/Hs_GRCh37.67.1/abundance.h5", : as.data.table.array method should be only called for arrays with 3+ dimensions, for 2 dimensions matrix method should be used

So it seems the source of the problem is data.table. The wrong type of as.data.table() is being invoked by the generic as.data.table() call. Not sure if bug or deliberate feature... The object returned by h5read() self-identifies as an array, which is why the array type of as.data.table() gets called. However it is just a one-dimensional array (a vector really) and data.table has become fussy about that.

If that's ALL that's going on, it's an easy fix.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bartongroup/RATS/issues/55#issuecomment-387703384, or mute the thread https://github.com/notifications/unsubscribe-auth/ANJNKTe9toVVLEVLKHL4tTUccEDq85wAks5twsz4gaJpZM4T2Tsq .

fifdick commented 6 years ago

For me it worked. I only had to edit the one that is called on the id "array" (of course in both group loops A and B)

as.data.table( h5read(file.path(fil, "abundance.h5"), "/bootstrap") )

This worked just as it was... 2018-05-09 15:54 GMT+02:00 Dr. Kimon Froussios notifications@github.com:

Really? While that fixes row 83, the same error should occur on row 85 and that one should be cast to numeric, not character.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bartongroup/RATS/issues/55#issuecomment-387746480, or mute the thread https://github.com/notifications/unsubscribe-auth/ANJNKX2IfQyJYtKJi-6ptHvq9Le_Q5Cfks5twvUhgaJpZM4T2Tsq .