Aufiero / circRNAprofiler

10 stars 3 forks source link

miRs not found #4

Closed EricSHo closed 4 years ago

EricSHo commented 4 years ago

Hi,

I have troubles in making getMiRsites to work. the checkProjectFolder() returned check 0.

check <- checkProjectFolder() Missing or empty motifs.txt file. Optional file. If absent or empty only ATtRACT motifs will be analyzed Missing or empty traits.txt file. Optional file. If absent or empty all traits in the GWAS catalog will be analyzed Missing or empty transcripts.txt. Optional file. If absent or empty the longest transcripts for all circRNAs will be analyzed check [1] 0

My miRs.txt contains 1235 mmu miRbase IDs, e.g. id

mmu-let-7g mmu-let-7i mmu-mir-1a-1 mmu-mir-15b mmu-mir-23b mmu-mir-27b ...

Here's how I called the getMiRsites(...) function:

miRsites <- getMiRsites( targetsFTS_circ, miRspeciesCode = "mmu", miRBaseLatestRelease = TRUE, totalMatches = 6, maxNonCanonicalMatches = 1 )

The targetFTS_circ contains 30 circRNAs. Initially, it printed the following:

miRs not found: >mmu-let-7g, >mmu-let-7i, >mmu-mir-1a-1, >mmu-mir-15b, >mmu-mir-23b, >mmu-mir-27b, >mmu-mir-29b-1, >mmu-mir-30a, >mmu-mir-30b, ...

Then it crashed with the following error message:

Error in while (indexTargetSeq <= (circLen + (analysisStart - 1))) { : missing value where TRUE/FALSE needed

What might be wrong?

Help appreciated.

Eric.

Aufiero commented 4 years ago

Hi Eric,

let me check and I'll let you know asap.

S

Aufiero commented 4 years ago

Hi Eric,

the name of the microRNAs is not correct, e.g mmu-mir-15b should be mmu-miR-15b-5p or mmu-miR-15b-3p, the same for mmu-mir-23b it should be mmu-miR-23b-5p or mmu-miR-23b-3p. If you want to analyze only these miRs, go to http://www.mirbase.org/ftp.shtml and download mature.fa file and check the name of the miRs that you are interested in. Then put these miR names in the miRs.txt file:

id
>mmu-miR-15b-3p
>mmu-miR-15b-3p
>mmu-miR-23b-5p
>mmu-miR-23b-3p

Let me know if it works. S

EricSHo commented 4 years ago

Thanks a lot.

It's working but it seems to take a long time to run. I did hone into the specific tissue, and yet, it still has 1200 miRNAs to process. A further question, if I have 9200 targeted circRNAs, do you know how long it will take?

Appreciated your help again.

Eric.

Aufiero commented 4 years ago

Hi Eric,

it takes time to run the analysis since each circRNA sequence needs to be analyzed for the presence of the miR sequences. The run time changes also based on the length of the circRNA sequences and the number of miRs. 9200 circRNAs are quite a lot, probably it will take a long time to finish (probably days). You can open a new R session and leave it open until it finishes. You can also get an estimate of the run time by doing a test with only 1 circRNA with the 1200 miRs.

I also suggest that you run the analysis only for the differentially expressed or the highly expressed circRNAs.

S

EricSHo commented 4 years ago

Thanks again.

I'll take your advice, or maybe I can run it in batch.

Regardless, after hours of waiting, the job crashed with the following error:

Screen Shot 2020-07-12 at 10 14 25 AM
EricSHo commented 4 years ago

I have filtered the circ seqs and split them into batches, 20 seqs each. However, the first batch has been running for 50 minutes. I am wondering is retrieving miRNA seqs from miRBase is the bottleneck. As I have already retrieved the mature.fa, is it possible to speed up the process by providing the mature.fa?

Thank you for supporting along the way.

Eric.

Aufiero commented 4 years ago

Hi Eric,

The miR sequences retrieval is quite fast, it is the analysis itself that takes time, it takes time to scan the circRNA sequence for the presence of each miR sequence. In the next release, I'll see whether I can make it faster, maybe with parallel computing. FYI, to analyze 1 circRNA sequence of length ~1700 nt for the presence of 360 miRs, it took ~4h 20m. What you can do at the moment, is to be more stringent with the filtering and reduce the number of circRNAs and miRs.

About the error that you got, I do not know what caused it, I can not reproduce it.

S

EricSHo commented 4 years ago

No problem. The extraction of circRNA sequence function has already helped a lot. Thank you for sharing your work.