GoekeLab / bambu

Reference-guided transcript discovery and quantification for long read RNA-Seq data
GNU General Public License v3.0
190 stars 24 forks source link

Merging qname from BAM file to readId.x #435

Closed hannalee809 closed 4 months ago

hannalee809 commented 5 months ago

Hi!

I ran bambu analysis using se.multiSample (I have 3 replicates) and was having trouble merging the readIDs from the bam files to the output of se.multiSample. I have been trying to merge by the "readId.x" from the metadata(se.multiSample)$readToTranscriptMaps[[1]] with the "qname" from the bam file. Any suggestions or thoughts would be really helpful, thank you!!

andredsim commented 4 months ago

Hi there,

Could you paste the head() of the two tables you are trying to merge so that I can get a better idea of the problem.

Kind Regards, Andre Sim

hannalee809 commented 4 months ago

Hi,

Thank you for your response! The tables that I am trying to merge are the unique_read_info which contains information from the bam file and the merged_data. The merged_data contains metadata(se.multiSample)$readToTranscriptMaps[[1]] which I had merged with the fullLengthCounts. Here are the head() of the two tables.

Screenshot 2024-07-01 at 8 16 44 AM

Screenshot 2024-07-01 at 8 16 55 AM

The qname and the readID.x are not matching up when I run se.multiSample, but when I run each sample individually and not together, it does match up. If it is helpful, here is the code I ran for the se.multiSample:

Screenshot 2024-07-01 at 8 22 26 AM

andredsim commented 4 months ago

Hi, Thanks for sharing these. It looks like you are running bambu correctly as I do not see any issues there.

I just have a few more questions.

Is unique_read_info from 1 bam file (ie. RNA_cell_naive1.bam)? And does that match with the first file name in this vector names(metadata(se.multipleSample)$readToTranscriptMaps). As you used metadata(se.multiSample)$readToTranscriptMaps[[1]] this selects the read map for the first bam file (however the order might differ from the input order)

When you try merge unique_read_info and merged data what is the output? Is it an error or perhaps an empty table? In the examples you show, could you find if "2bfee94f-8ea0-466e-aec4-ff14000f8cd1" %in% unique_read_info$qname?

How did you merge the full length counts and the read to transcript map, they do not have any columns that directly key into each other. There are ways around this but it is a bit messy as reads can have multiple equal matches?

What is your final goal output that you want to produce/question you want to ask? Maybe I can suggest an alternative way of producing it, if I know how.

Kind Regards, Andre Sim

hannalee809 commented 4 months ago

Hi!

After looking more thoroughly into the code, I was able to resolve the merging issue! Essentially, I merged the metadata with the rowData of the multisample. This provided me with the columns needed to merge with the full length counts. Because of the merging of different data frames, it did get complicated and I had to be more careful with the process. Thank you so much for your support!