arshajii / ema

Fast & accurate alignment of barcoded short-reads
http://ema.csail.mit.edu
MIT License
32 stars 7 forks source link

Supporting for stLFR reads #24

Closed YingYa closed 5 years ago

YingYa commented 6 years ago

Will EMA support mapping reads from the stLFR (http://dx.doi.org/10.1101/324392), where the uncorrected barcode is at the 3'-end of read2.

arshajii commented 6 years ago

Hi @YingYa, This looks like it should be easy to support during preprocessing. Are there any unique properties of this data type compared to 10x? e.g. what is the barcode length? read length? avg. reads per barcode? If these are at least comparable to 10x then we can likely just add a preprocessing flag for this data type and run EMA as is.

YingYa commented 6 years ago

Hi @arshajii , There are some properties of the stLFR data:

  1. The raw data were sequenced by BGISEQ-500 with read length 100 (read1) + 100 (read2) + 54 (read2).
  2. The whole barcode is combined by three sub-barcodes (B1, B2, B3) randomly, which is located among the tail 54bp of read2.
  3. There are 1536 sub-barcodes with fixed length of 10bp. Go to 'https://www.biorxiv.org/highwire/filestream/99369/field_highwire_adjunct_files/1/324392-2.xlsx' for more detail.
  4. The tail 54bp of read2 contain three sub-barcodes (B1, B2, B3) and two other primers (P1, P2). And 54 (position from 101 to 154) = B1 (10bp) + P1 (6bp) + B2 (10bp) + P2 (18bp) + B3 (10bp).
  5. 1 mismatch maybe allow for each sub-barcode when processing the data.
  6. avg. reads per barcode would be 40~50 in 100Gb data.
YingYa commented 6 years ago

Hi @arshajii ,

What does each parameter mean and I want to create a new PlatformProfile profile in 'src/techs.c' for stLFR?

Thanks

arshajii commented 6 years ago

Here's a brief description of each; let me know if you need any more info about any parameter.

zhangtongda commented 5 years ago

Will EMA support mapping reads from the stLFR? it's solved ??

arshajii commented 5 years ago

@zhangtongda: Check out @YingYa's fork. It looks like he was adding stLFR support. For now I'll close this issue as we don't have plans to add this ourselves; hopefully @YingYa can reply to let us know the status of his fork and if he was able to test on stLFR data.