genomicRanges issue - Githubissues

marvel479 commented 4 years ago

Hi, I successfully created the cdata data frame, and the output of cdata for the 2 samples I have looks like this:

sample filename 1 zf_control A:/A/circRNAs/Rstudio/input/zf_control.sites.bed 2 zf_gfp A:/A/circRNAs/Rstudio/input/zf_gfp.sites.bed

But when I try the summarizeCircs function, I get the following error:

Fetching circular expression Processing linear transcripts Error in .find_start_end_cols(df_colnames0, start.field0, end.field0) : cannnot determine start/end columns

I believe the problem lies in genomicRanges function which is being used for summary object creation, because when I do: try(GenomicRanges:::.find_GRanges_cols(names(cdata))), the result is:

Error in .find_start_end_cols(df_colnames0, start.field0, end.field0) : cannnot determine start/end columns

which is the original error. Am I missing something here?

my bed files come from find_circ and have the following structure: names(zf_control)

[1] "chrom" "start" "end" "name" "n_reads"
[6] "strand" "n_uniq" "uniq_bridges" "best_qual_left" "best_qual_right" [11] "tissues" "tiss_counts" "edits" "anchor_overlap" "breakpoints"
[16] "signal" "strandmatch" "category"

mschilli87 commented 4 years ago

@marvel479: Can you share the first few lines of A:/A/circRNAs/Rstudio/input/zf_control.sites.bed (the actual file, not after importing it to R) and how you generated it? A:/ looks like you are on Windows but the BED file looks like it was generated using find_circ.py, which AFAIK requires Linux (or at least Mac?) to run.

Also, could you try the following and share the results?

library("ciRcus")
setwd("A:/A/circRNAs/Rstudio")
cdata <- data.frame(sample = "zf_control", filename = "zf_control.sites.bed")
summarizeCircs(cdata, qualfilter = FALSE)

marvel479 commented 4 years ago

Hi Marcel, I have used find_circ.py in linux to generate my bed file but thereafter transferred this file to my computer, to be used with Rstudio. As bed files can be opened in excel, this is what it looks like. it still is a .sites.bed file.

Also, I get the same error with summarizeCircs when using R on command line, instead of Rstudio.

The results of the commands are as follows:

library(ciRcus)
setwd("A:/A/circRNAs/Rstudio/input")
cdata <- data.frame(sample = "zf_control", filename = "zf_control.sites.bed")

cdata

sample filename 1 zf_control zf_control.sites.bed

summarizeCircs(cdata, qualfilter = FALSE)

Fetching circular expression Error in MungeColumn(merge.fos, circ.gr, circ.gr.reduced, "n_uniq") : unknown column name: n_uniq

mschilli87 commented 4 years ago

@marvel479: I specifically asked for the first lines of the actual file. Not imported to R, and definitely not imported to Excel. :wink: On Linux, you'd just run head zf_controls.sites.bed.

Also, please do not post a screenshot, but the actual text, so we can copy/paste it and try to reproduce this problem on our systems.

Did you modify the file after running find_circ.py? Note that simply opening and saving (without actualling changing anything) from e.g. Excel could potentially modify the file. So just in case I'd highly recommend to re-copy the original find_circ.py ouput files from your Linix machine without opening them in Excel in between. Ideally, you could run the test above using command line R on the very linux system you ran find_circ.py on, just to rule out RStudio and Windows as additional factors.

@retaj: If this is not some (Excel?) screw-up, is it possible some find_circ.py version generates a header without comment char? From my understanding, there should always be one which seems to be missing here (if Excel doesn't lie). According to ciRcus, for find_circ.py prior to version 2 (18/19 columns like here), there should even be a space:

https://github.com/BIMSBbioinfo/ciRcus/blob/4415370ad5e1791238c50338bdb4bcb11fc7a96b/R/readData.R#L33

marvel479 commented 4 years ago

Oh. Oh. I get the problem now. The bed file that I am using is the merged version generated using merge.py, also from the find_circ package. It was merging the the genomic intervals and putting all circ RNAs in those intervals in one row. Also, the problem may come from me exporting to excel because I wanted to look at the file. After using the original, unmanipulated version, the code seems to work both in R and Rstudio. Thanks a lot, Marcel!

sayan08 commented 2 years ago

Hi, I am having a very similar issue. I have not opened the .bed file elsewhere in between.

I am using the .bed files generated by the find_circ2 pipeline. I am getting very similar error messages as pointed out earlier. Can you please help me how could I run the summarizeCircs on my data?

Shivachetan-Ulavi commented 2 years ago

head circ_splice_sites.bed

#chrom  start   end     name    n_frags strand  n_weight        n_spanned       n_uniq  uniq_bridges    best_qual_left  best_qual_right tissues tiss_counts     edits   anchor_overlapbreakpoints     signal  strandmatch     category        flags   flag_counts
chr4    83765538        83770130        /data/humgen/daskalakislab/ulavi/PEC/controls/circRNA/CMCMSSM033_circ_008658    1       -       1.0     1       1       1.0     56      3    /data/humgen/daskalakislab/ulavi/PEC/controls/circRNA/CMCMSSM033 1.0     0       0       1       GTAG    N/A             N/A     0

Hey @mschilli87 this is the first few lines from my bed file & this is unmanipulated version and I am still getting the same error if If I ran summarizeCircs on my data. Can you please help me with this as I am unable to figure out what is wrong.

Thank you

BIMSBbioinfo / ciRcus

genomicRanges issue #57