bicciatolab / Circr

Circr, a computational tool for the prediction of circRNA-miRNA associations.
2 stars 3 forks source link

Missing information of exon coordinates #4

Closed YueqiJin closed 1 year ago

YueqiJin commented 1 year ago

Hello, thank you for the useful tool.

When running Circr, we notice the output_fasta.fa generated by line 349 or line 369 contain a full sequence from BSJ receptor to BSJ donor and all exon data are not considered, is it necessary to add a split option for bedtools getfasta to consider the exon coordinates?

When we provided a bed12 as input file for Circr, error occurred when pandas reading a 6-columns table from a 12-columns table. Here is the log of script running: Traceback (most recent call last): File "Circr", line 618, in main(args.input, File "Circr", line 550, in main interactions = add_CircBase_annotation(interactions, bedfile, genome_version) File "Circr", line 517, in add_CircBase_annotation tbm = ix[0] + start.astype(str) + end.astype(str) + ix[2] numpy.core._exceptions._UFuncNoLoopError: ufunc 'add' did not contain a loop with signature matching types (dtype('int64'), dtype('<U8')) -> None

Jimmy-A-Caroli commented 1 year ago

Dear Jin, sorry for the late reply. If you specify the "--coord" option when running Circr, the coordinates you provide in the BED file will be used as they are. If you do not use the "-coord" option, Circr will compare the provided coordinates with the genomic annotation and keep only those regions that overlap exons. Therefore, if you want to discard the sequences corresponding to introns, you can either: 1) provide a BED file with the coordinates of the full sequence and let Circr keep only the exons, without specifying the "--coord" parameter; or 2) provide a BED file that already contains only the coordinates of each exon and use the --coord parameter. If you are not obtaining the expected output it could mean that the region you provide in the BED file has no overlap with the gene annotation used by Circr. In this case please consider using option 2) described above or providing another gene annotation (GTF file) that contains the gene you are interested in (see https://github.com/bicciatolab/Circr#providing-all-files). Regarding your second question, please try providing a BED file with 6 columns formatted as described in https://github.com/bicciatolab/Circr#running-circr-with-the-provided-annotation-files.