Closed YingYa closed 5 years ago
Hi @YingYa, This looks like it should be easy to support during preprocessing. Are there any unique properties of this data type compared to 10x? e.g. what is the barcode length? read length? avg. reads per barcode? If these are at least comparable to 10x then we can likely just add a preprocessing flag for this data type and run EMA as is.
Hi @arshajii , There are some properties of the stLFR data:
Hi @arshajii ,
What does each parameter mean and I want to create a new PlatformProfile profile in 'src/techs.c' for stLFR?
Thanks
Here's a brief description of each; let me know if you need any more info about any parameter.
name
: Unique name string for the platformextract_bc
: Pointer to a function that extracts the barcode from a FASTQRecord
object. Probably the easiest thing to do in your case would be to format your FASTQs in the way EMA expects (with :<barcode sequence>
after the FASTQ identifier, like @read1:ACGTACGT
) then just use the 10x barcode parsing function, extract_bc_10x
.many_clouds
: Some technologies (e.g. Moleculo) can have many reads per barcode, which necessitates slight changes to the algorithm. Looks like your technology doesn't have too many reads per barcode though, so you can just set this to 0
.dist_thresh
: Distance threshold to use when grouping alignments into clouds. If your fragment lengths are similar to 10x's, you can probably use the same value of 50k, otherwise scaling this proportionally would probably be okay.error_rate
: Per-nucleotide error rate of the sequencer (e.g. 10x has a 0.1% error rate, so we set this to 0.001
).n_density_probs
/density_probs
: This encodes the probability of seeing a particular number of reads in a 1kb window within a fragment. For example, for 10x we have density_probs = [0.6, 0.05, 0.2, 0.01]
(and therefore n_density_probs
-- the length of density_probs
-- is 4); this means there's a 60% chance of seeing zero reads in a 1kb window, 5% of seeing one read, 2% of seeing two reads (i.e. one read pair) and 1% of seeing 3 reads. Higher read numbers' probabilities are scaled down exponentially automatically, which is why these probabilities don't sum to 1. If you don't plan to use the read density optimization feature of EMA, you can ignore all this. Otherwise, the best way to determine these probabilities is to do a regular alignment and look at uniquely-mapping fragments, and create a histogram of read counts per 1kb window.Will EMA support mapping reads from the stLFR? it's solved ??
Will EMA support mapping reads from the stLFR (http://dx.doi.org/10.1101/324392), where the uncorrected barcode is at the 3'-end of read2.