Closed mars188 closed 3 years ago
Hi there,
The output of the miRNA module is a count table where ID is a unique entry with counts for each sample in columns. You can input this table into any differential expression analysis package (e.g. DESeq2, limma, etc).
Here is your example provided:
ID | sample1 | sample2 | sample3 |
---|---|---|---|
hsa-let-7a-3p&NA&iso-21-4L2MUYWZD | 21 | 5 | 13 |
hsa-let-7a-3p&iso_3p:+1&iso-22-4L2MUYWZJ | 55 | 64 | 101 |
hsa-let-7a-3p&iso_3p:+4,iso_add3p:2,iso_5p:+3,iso_snv_central&iso-22-UOW5X7Y7M | 10 | 4 | 22 |
hsa-let-7a-3p&iso_3p:+4,iso_snv_central,iso_5p:+3,iso_add3p:2&iso-22-UOW5X7Y7N | 34 | 46 | 40 |
hsa-let-7a-3p&iso_3p:+4,iso_snv_seed,iso_add3p:2,iso_5p:+3&iso-22-UXW597Y7N | 35 | 3 | 15 |
XICRA generates this table containing information at the isoform miRNA (isomiR) level (variant modifications of the canonical miRNAs as described in miRBase) and the ID column, contains all available information for each entry (separated by '&').
For example, take the ID = _hsa-let-7a-3p&iso_3p:+1&iso-22-4L2MUYWZJ,
_ it contains information of the miRNA (hsa-let-7a-3p), the modification from the original sequence in mirBase (iso_3p:+1) and the Unique Identifier (UID, unique sequence plate as generated by miRTOP). See here:
miRNA | variant type | UID |
---|---|---|
hsa-let-7a-3p | iso_3p:+1 | iso-22-4L2MUYWZJ |
Variant annotation is generated categorizing isomiRs into classes based on their sequence modifications: iso_5p, iso_3p, iso_add, iso_snv, iso_snv_seed, iso_snv_central_offset, iso_snv_central, iso_snv_central_suppl, following miRTOP suggested classification scheme. NA variants correspond to canonical isomiRs. See additional information on each variant in the miRTOP original publication https://pubmed.ncbi.nlm.nih.gov/31504201/.
In the example you provided here for hsa-let-7a-3p miRNA, the most abundant isoform is not the cannonical but the modified isomiR iso_3p+1.
XICRA.stats
is an R package that allows you to use the matrix table generated either at the isomiR, miRNA canonical or a variant type.
See details here: https://github.com/HCGB-IGTP/XICRA.stats You basically install it using devtools and using as follows:
# Install XICRA.stats version from GitHub:
# install.packages("devtools")
devtools::install_github("HCGB-IGTP/XICRA.stats")
data_XICRA <- XICRA.stats::parse_XICRA(path_to_matrix_file)
data_XICRA
is a list containing dataframes with the data group at different levels: isomiR, miRNA canonical or variant.
See an example using XICRA.stats
for the data analysis we use in the original publication of XICRA.stats
here:
https://github.com/HCGB-IGTP/XICRA/blob/master/BMC_bioinformatics_paper/code/GSE114923_DESeq2_analysis.R
Best regards
Thank you soooo much for the nice explanation. Now, I understand how output looks like. Just have two small (may be) question:
Hi there,
You are welcome! I hope you find it useful.
I would answer your questions.
You are right, numbers (e.g. 21, 5 and 13) are raw counts for each isomiR and each sample. In this case, using the software you specify in the miRNA module (e.g. miraligner), counts were obtained for each isomiR and sample, format was standarized using miRTop.
Yes, definitely. This is what I meant when I recommended to use XICRA.stats
to group counts for the same miRNA or the same variant.
Basically do:
data_XICRA <- XICRA.stats::parse_XICRA(path_to_matrix_file)
And as a results, data_XICRA
is a list containing dataframes (isomir_data
, miRNA_data
and variant_data
) with the data grouped at different levels: isomiR, miRNA canonical or variant type.
You can also use packages such as tidyr
or dplyr
to separate IDs (by &) and group_by counts at the miRNA level or the desired level.
Following the second question, you can just use the data at the miRNA level but take into account some new discoveries regarding the robustness and better discrimiation of results using isomiRs vs miRNA https://pubmed.ncbi.nlm.nih.gov/31882820/ https://pubmed.ncbi.nlm.nih.gov/28206648/
On the other hand, take into account potential problems regarding validation and amplification of single isomiRs as non template variants might be present and PCR primers might produce low performance.
I would recommend to make the analysis at both levels, at the miRNA and the isomiR level and check for consistencies of results and putative candidates.
Best regards!
Thank you so much. You are very knowledgeable and been extremely useful in my analysis.
Ok, I will conduct XCIRA.stat analysis and update you if I need any further help.
Thank youuuu again for all your time!
Hi there, You are welcome. I am glad I could help you in your analysis.
Do not doubt to contact me for further details if necessary. I am going to close the issue so far, please reopen it or a new one if you require so.
Best regards Jose F. Sanchez
This may be an R question but you may be able to help me with this too:
I tried the following command in R:
data_XICRA <- XICRA.stats::parse_XICRA(path_to_matrix_file)
It worked. When I run "data_XICRA" I can see the rows and columns in R studio.
However, I tried to export/save the data with write.csv or write.table as below:
write.csv(data_XICRA, "miRNA_results.csv")
It gives me the following error:
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 183767, 1866, 102686
And it doesn't write the output file. Any idea how to fix this?
I google searched this issue but couldn't find the solution :(
Hi there,
The thing is the R object data_XICRA
is a list of 3 dataframes:
data_XICRA$isomir_data
data_XICRA$mirna_data
data_XICRA$variant_data
that is why it generates the error when you try to save in a file.
Basically, the differences among the 3 dataframes are:
data_XICRA$isomir_data
is the isomir data generated by XICRA, the same information as input.data_XICRA$mirna_data
is the data grouped by miRNA. (You might be interested in this dataframe).data_XICRA$variant_data
, is the data grouped by miRNA and variant.Try to find the differences by doing:
head(data_XICRA$isomir_data)
head(data_XICRA$mirna_data)
head(data_XICRA$variant_data)
Regards, Jose
I just completed the analysis and obtained nice and clean miRNA as well as variant data files. Thank you sooooo much. You have been so helpful. Also, the pipeline is very useful too.
Best regards,
Hello,
I analysed the small RNA data using XICRA and it worked. So, I have now output file that's a little bit confusing for me. Here is how it looks like:
ID sample1 sample2 sample3 .... hsa-let-7a-3p&NA&iso-21-4L2MUYWZD 21 5 13 hsa-let-7a-3p&iso_3p:+1&iso-22-4L2MUYWZJ 55 64 101 hsa-let-7a-3p&iso_3p:+4,iso_add3p:2,iso_5p:+3,iso_snv_central&iso-22-UOW5X7Y7M 10 4 22 hsa-let-7a-3p&iso_3p:+4,iso_snv_central,iso_5p:+3,iso_add3p:2&iso-22-UOW5X7Y7N 34 46 40 hsa-let-7a-3p&iso_3p:+4,iso_snv_seed,iso_add3p:2,iso_5p:+3&iso-22-UXW597Y7N 35 3 15
So, main identified small RNA is "hsa-let-7a-3p"? and other part of the name are isoforms of "hsa-let-7a-3p"? and numbers are the read counts? How do we interpret these results?
Many thanks in advance!