Open KuechlerO opened 1 year ago
Hi,
- event_annotation.tsv: What exactly are the columns (e.g. what do the columns genomic_start, and genomic_end display; why are not only splice-variants, but also the templates listed in this file?)
The columns genomic_start
and genomic_end
document the location, i.e., the start and end of the splicing event, on a genomic level. transcriptomic_start
and transcriptomic_end
document this on the transcriptomic level.
The templates are given such that there is a clear reference for each event. When there are multiple splice variants for the same reference, other events could arise between these two splice variants.
- sim_tx_info.txt: What is the difference between foldchange.V1, and foldchange.c2? (What is V1, and what is c2?)
The columns in sim_tx_info.txt
correspond to the groups. I agree that their naming is confusing. This is because they only have indices internally. I will try to fix this.
Does this help? Best, Quirin
Cool, yes this helps. Thx for the quick reply! :)
Another point: It is not obvious to me, which group is control and which one is not. Or in general, how the groups are created.
In your example, you write:
# define, how many groups and samples per group you analyze. Here we create a small experiment with two groups with one sample per group:
num_reps = c(1,1)
Ok, so 2 groups means that one is control, and the other one is the variant group? Is the first group the control group? What happens if I choose >2 groups?
Currently, there is no clear distinction between the groups. This tool only supports the functionality provided by polyester. The fold changes documented in sim_tx_info.txt
are introduced randomly.
In principle, the groups do not care about the variants, as polyester just simulates from the transcripts given by the ASimulatoR.
Mhm, ok. So the splitting in groups is just introduced for downstream fold change simulations with polyester?!
One more question:
Could you also please explain the exact effect of event_probs
?
My undertanding:
The event-frequency gives the frequency for the specific variant to appear in the given gene (in each sample?!).
But then, why is the sum of the event-frequencies restricted to sum(event_freq) == 1
?
My actual goal is:
--> As far as I have understood, this could right now only be achieved by starting a separate run for each splice variant and setting the event_freq=1
.
Am I right?
Thx for your help! :)
Mhm, ok. So the splitting in groups is just introduced for downstream fold change simulations with polyester?!
Yes, the groups are a parameter for polyester.
One more question: Could you also please explain the exact effect of event_probs? My undertanding: The event-frequency gives the frequency for the specific variant to appear in the given gene (in each sample?!). But then, why is the sum of the event-frequencies restricted to sum(event_freq) == 1?
From the README:
Probability: For each superset we create an event with the probability mentioned in event_prob
.
Frequency: Set probs_as_freq = T
. The exon supersets are partitioned corresponding to the event_prob
parameter.
and
Named list/vector containing numerics corresponding to the probabilites to create the event (combination). If probs_as_freq
is TRUE
event_probs
correspond to the relative frequency of occurences for the event(combination) and in this case the sum of all frequencies has to be <=1.
My actual goal is: 1.Simulate RNAseq reads for first group without any variants --> Have it as controls 2.Simulate RNAseq reads for second group with variants --> Have this as patient cohort So in the second group I would like to set for specific genes to have splice variants. This should appear at specific frequencies: E.g. 100%, so all reads are following a specific splice pattern --> E.g. homozygous variant that destroys a splice site.
ASimulatoR was created to benchmark event detection tools. If I understand correctly, you are analyzing differential splicing and isoform switching.
--> As far as I have understood, this could right now only be achieved by starting a separate run for each splice variant and setting the event_freq=1.
You could still create gtfs with splice events using ASimulatoR and then give this custom gtf to polyester with your own fold_change table.
This is not recommended, but should work. An example is attached. I added .txt
because GitHub doesn't allow attaching Rscripts.
Hey guys, thx for the great tool!
I was just wondering whether I have missed something or is there just not more documentation on the output files available?
My specific questions: