COMBINE-lab / minnow

10 stars 2 forks source link

No information on true UMI sequences? #18

Closed ColeWunderlich closed 2 years ago

ColeWunderlich commented 3 years ago

Hello,

Is there a way to get the true UMI sequence for each read?

So far after running minnow my read names look like this (example from R1 file):

@AAACCTGTCTTGTTTG:CDR1:159102:0:0
AAACCTGTCTTGTTTGCGGGGTTAGC
+
NNNNNNNNNNNNNNNNNNNNNNNNNN

The cell barcode is clearly in the read name, but there is no information about the true UMI sequence for the read. I also cannot find this information anywhere else in the minnow output.

This information seems critical since the observed UMI sequence may contain PCR errors.

Is this a bug or does minnow not normally give you information about the true UMI sequence for each read?

hiraksarkar commented 3 years ago

Hi @ColeWunderlich ,

Thanks for using minnow, and apologies for catching up late. I am going through the issues and trying to address them as best as I can,

I just went to the code to see if we write the original UMI, and we don't report that. We only keep https://github.com/COMBINE-lab/minnow/blob/minnow-velocity/src/MinnowSimulate.cpp#L843 the original Cell barcode name. But of course this can be reported. It could result in a bigger file size.

I can certainly push the change to have that, are you generating the data from Splatter, and using the custom cell barcode names? Then I will change it accordingly.

Thanks Hirak

ColeWunderlich commented 3 years ago

Hey @hiraksarkar thanks for getting back to me.

It would be great to have the true UMI in the read name and I appreciate you being willing to add it as a feature. I'm guessing this would be tagSeq in the code you linked to? Also, I have been assuming that the modifiedCellName being output here has no PCR error in it's sequence, is that correct?

Yes, so far I have been using splatter mode. I want to use the output from an alevin run, but so far I get zero genes whenever I run in Alevin mode. My work around has been to transpose the Alevin matrix, convert it to a csv (which takes hours for pandas to write to disk), and then feed that into minnow in splatter mode with the --custom flag. (I also switch the row and column files so that they match the transposed matrix).

hiraksarkar commented 3 years ago

Hi @ColeWunderlich ,

I have mostly developed the splatter-based developed because that has become the favorable choice for most of the users. What you did by transposing and using the --custom flag should be right.

modifiedCellName should be the unchanged version of cell, I just added UMI in my latest commit to it. Let me know if that works.

Thanks

ColeWunderlich commented 3 years ago

Hey @hiraksarkar,

Sorry for taking so long to get back to you.

That makes sense. From reading some of the old issues I got the impression that splatter-mode was the only mode currently supported.

I applied the new fix and performed a test run, everything looks like it is working as expected. Thanks for the update!

I also have a few questions about how to run minnow properly. Would this thread be a good place to discuss them?

hiraksarkar commented 3 years ago

Hi @ColeWunderlich ,

Feel free to ask here or drop me an email at hiraksarkar.cs@gmail.com, we can chat more. If you want to discuss the modes or modifications, I won't close the issue. Please let me know.

Thanks

hiraksarkar commented 2 years ago

Closing this due to inactivity.