colomemaria / epiAneufinder

R package to detect breakpoints and assign somies to scATAC-seq data
GNU General Public License v3.0
33 stars 7 forks source link

count matrix is empty and the execution crashes #1

Closed Laolga closed 1 year ago

Laolga commented 2 years ago

Hi! I've run the package with the sample data and it went smoothly. Next, I tried to run it on my atac-seq data. I replaced the input file, genome (BSgenome.Hsapiens.NCBI.GRCh38), the corresponding blacklist and change the chromosome notation in the exclude variable (e.g. exclude=c("X","Y","MT")). Then it seems like the windows are computed successfully, but the count matrix is somehow produced empty. This is how fragments variable looks: image And this is windows:

image Don't mind weird ranges in fragments, that was the intentional design. Could you please point me towards a possible issue?

Thanks! Olga

akshaya4r commented 2 years ago

Hi Olga,

I believe the count matrix is empty because your fragments are larger than your window sizes. The count matrix can't be generated in this case.

Laolga commented 2 years ago

Hi! So I made windowSize=1e6 but nothing changed:

 "Obtaining the fragments tsv file"
GRanges object with 6 ranges and 2 metadata columns:
      seqnames    ranges strand |            barcode       pcr
         <Rle> <IRanges>  <Rle> |        <character> <integer>
  [1]        1 1-1000000      * | AAACGAAAGACGCCCT-1         6
  [2]        1 1-1000000      * | AAACGAAAGAGTTCGG-1        24
  [3]        1 1-1000000      * | AAACGAAAGGATTAAC-1         2
  [4]        1 1-1000000      * | AAACGAACATGTAGAA-1         0
  [5]        1 1-1000000      * | AAACGAACATTTGGCA-1        52
  [6]        1 1-1000000      * | AAACGAAGTAATGCAA-1        26
  -------
  seqinfo: 22 sequences from an unspecified genome; no seqlengths
NULL
Getting Counts...
Counting reads from fragments/bed file .. 
[1] "Count matrix has been generated and will be saved as count_summary.rds"
Error in setnames(x, value) : 
  Can't assign 1 names to a 0 column data.table
In addition: Warning messages:
1: In dir.create(outdir, recursive = TRUE) :
  'results_sample/epiAneufinder_results' already exists
2: In file.remove(list.files(outdir, full.names = TRUE)) :
  cannot remove file 'results_sample/epiAneufinder_results/epiAneufinder_results', reason 'Directory not empty'
3: In .Seqinfo.mergexy(x, y) :
  The 2 combined objects have no sequence levels in common. (Use
  suppressWarnings() to suppress this warning.)

Is there something I can do?

akshaya4r commented 2 years ago

Hi Olga, It seems that the chromosome names in your BS genome and the chromosome names in the fragments file don't match.

Suger0917 commented 2 years ago

Hi AR,

Thanks for the tool. I also meet the same error: Error in setnames(x, value) : Can't assign 1 names to a 0 column data.table

My samples aligned to another reference, like from NCBI but not UCSC.

My input sample.tsv looks like this: 1 10082 10309 AAGAACAGTTTAGTCC-1 1 1 10083 10309 ACCAGCTCAAACGGGC-1 1 1 10084 10303 TGGTCAGTCCGCACAA-1 1 1 10085 10436 AACAGGATCATGAGCT-1 1 1 10085 10531 TGTAATGTCGCTAGAT-1 1 1 10096 10339 GTTTGCTGTGTCCAGG-1 2 1 10097 10496 GCGTTTCTCTAGCTTT-1 1 1 10106 10531 TCGTTAAAGTTCCTGC-1 1 1 10125 10316 GAAGGAACAAGCCAGA-1 1 1 10156 10507 AGGTAACCATCCCTCA-1 1

So I replaced the input file genome (BSgenome.Hsapiens.NCBI.GRCh38)

Human genome:

organism: Homo sapiens (Human)

genome: GRCh38

provider: NCBI

release date: 2013-12-17

455 sequences:

1 2

3 4

5 6

7 8

9 10

... ...

HSCHR19KIR_FH05_B_HAP_CTG3_1 HSCHR19KIR_FH06_A_HAP_CTG3_1

HSCHR19KIR_FH06_BA1_HAP_CTG3_1 HSCHR19KIR_FH08_A_HAP_CTG3_1

HSCHR19KIR_FH08_BAX_HAP_CTG3_1 HSCHR19KIR_FH13_A_HAP_CTG3_1

HSCHR19KIR_FH13_BA2_HAP_CTG3_1 HSCHR19KIR_FH15_A_HAP_CTG3_1

HSCHR19KIR_RP5_B_HAP_CTG3_1

I excluded all of other chromosomes not in 1-22.

Hi Olga, It seems that the chromosome names in your BS genome and the chromosome names in the fragments file don't match.

KatharinaSchmid commented 1 year ago

Hi, first of all, very sorry for coming back to you so late. It is difficult to say what exactly causes the issue without seeing the code. In general, we tested EpiAneufinder to run with NCBI instead of UCSC genome annotations, this is working well. It is important to set in this case not only the genome version correctly, but also the backlist file accordingly and the parameter exclude for excluding chromosomes. We improved the error messages in our newest package version 1.0.2 to better capture mismatching chromosome naming. So we expect that these cryptic error messages are not happening anymore. Please come back to us if you have still problems here. I will close this issue for now, feel free to reopen it.