bartongroup / RATS

Relative Abundance of Transcripts: An R package for the detection of Differential Transcript isoform Usage.
MIT License
32 stars 1 forks source link

Need to Use Rep() in call_dtu #71

Closed Tenayav closed 1 year ago

Tenayav commented 1 year ago

Hi!

I am a biochemist, learning to code and wanted to reach out about some troubles I'm having with running RATs with my data. My research question is to find out which isoform is more abundant in a particular sample compared to another. RATs seemed like a good fit. I'm using data from Salmon running in NF-core and in TPMs. Below are the details. Any suggestions would be most helpful! Thank you for putting together RATs and maintaining it!

The problem that I am running into is mismatch between item numbers.

Error in [.data.table(Genes, , :=(elig_transc, Transcripts[, as.integer(sum(elig, : Supplied 21207 items to be assigned to 59385 items of column 'elig_transc'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.

Version: Relative Abundance of Transcripts v.0.6.7 TPM data.table from Salmon -> See screenshot Annot, mycond_A, and mycond_B -> See screenshots. Data should be in TPM. I'm still trying to figure out whether it is bootstrapped or not, but I figured I could get started on what I have. I ended up rounding my data to get integers, otherwise there are 6 decimal points.

My Script:

targets <- txs_tpm[c(1)] frac2 <- data.table(txs_tpm[c(1,6,16)] ) # Simulated abundances for fraction 2 frac2_A <- round(frac2[,c(2,3)], digits=0) frac2 <- data.table(cbind(targets, frac2_A))

colnames(frac2)<- c("target","V1","V2")

frac5 <- data.table(txs_tpm[c(1,9,19)]) # Simulated abundances for fraction 5 frac5_A <- round(frac5[,c(2,3)], digits=0) frac5 <- data.table(cbind(targets, frac5_A)) colnames(frac5)<- c("target","V1","V2")

annot <- data.frame(txs_tpm[c(1,3)]) # Transcript and gene IDs for the above data. colnames(annot)<- c("target_id", "parent_id")

mydtu <- call_DTU(annot= annot, count_data_A= frac2, count_data_B= frac5, verbose=TRUE, scaling=1, name_A= "Fraction 2", name_B= "Fraction 5", varname= "Fractions", description="Comparison of Fraction 2 and Fraction 5 Cell Extracts from Rep1 after Threshold and in TPMS. RATs analysis.")

Attempts to fix the error: rep(call_dtu(....)) -> Same error message rep_lin(call_dtu(...), nrow(dat)) -> Same erro message Running smaller number of genes, 3 or even 11. Still having the same problem. But I don't have this problem when I run the Simdata so I figure it's a problem with my data, not the script.

TPM_datatable mycond_B_Frac5 mycond_A_Frac2 myannot
fruce-ki commented 1 year ago

Hi!

The error is not making much sense to me and I do not see any obvious issue in the screenshots.

Can you please attach a small subset of the data tables and the corresponding subset of the annotation (that still produces the error for you) as actual text files so that I can try to run them and poke around?

Tenayav commented 1 year ago

Of course! Thank you for taking a look at this. I downloaded them as csv files. Hope those work for text files!

Best, Tenaya myannot.csv mycond_B.csv mycond_A.csv

fruce-ki commented 1 year ago
Screenshot 2023-03-06 at 19 38 04

CSV is excellent for the job. I assume the row number column is an artifact of exporting without explicitly telling it not to write the rownames. So I deleted that column from each file.

I then ran your files, and I got no error whatsoever. See screenshot. What are you doing differently?

fruce-ki commented 1 year ago

Can you do the checks as in my screenshot and post the verbose console output from those and from call_DTU() down to and including the error? That would narrow down what section is throwing the error.

Tenayav commented 1 year ago

Sure thing! It was easy to run. Here is a screenshot and text file of the checks and error message. Thank you for your help! The row number is an artifact. Sorry about that!

Screen Shot 2023-03-06 at 12 10 43 PM

2023_03_06_RATs_Kimon_Checksrequest.txt

Tenayav commented 1 year ago

Oh and to cover bases, I did a clean sweep of my environment before trying the above. Just in case!

fruce-ki commented 1 year ago

I don't see anything wrong with that. Something must have changed in the 3rd party libraries.

Can you post your sessionInfo() ?

Tenayav commented 1 year ago

Can do! Here it is! 2023_03_06_Sessioninfo.txt

fruce-ki commented 1 year ago

So you have older versions of R etc.

I tried to install the same versions as you, but the combination of versions is not available for Mac. Likely the issue is in one of the dependencies.

Can you try upgrading to newer versions and see if that fixes your problem?

Tenayav commented 1 year ago

Definitely trying. Our institution's server is sorely behind on updating R and it makes things complicated for us. I'm currently struggling on GRanges, but hoping to have all the dependencies updated for R/4.2.2 version soon.

fruce-ki commented 1 year ago

You should have more freedom with conda environments within your user space. Linux should have all the "normal" versions of R, unlike Mac.

Tenayav commented 1 year ago

Sorry for taking so long. I had to update everything and switch to my personal computer because of admin limitations on both my school server and my work laptop. Your suggestion worked perfectly! Thank you! Feel free to marked this as resolved.

fruce-ki commented 1 year ago

Good to hear! I am glad it worked out. There's probably documentation I need to update with regards to recommended versions for dependencies...