databio / peppro

A modular, containerized pipeline for PRO-seq data processing
http://peppro.databio.org/
BSD 2-Clause "Simplified" License
10 stars 2 forks source link

FRIF calculation issue #69

Closed wongwic closed 4 years ago

wongwic commented 4 years ago

I was using a custom annotation file while running peppro, and got the following issue when the frif was being calculated. Any idea what the cause of this could be? I have a pretty strange annotation file, but I think my bed file conforms to the standard listed on the site.

Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 0, 2 Calls: plotFRiF -> calcFRiF -> cbind -> cbind -> data.frame Execution halted Command completed. Elapsed time: 0:00:32. Running peak memory: 2.795GB.
PID: 21430; Command: Rscript; Return code: 1; Memory used: 0.074GB

jpsmith5 commented 4 years ago

Would you be able to share your custom file? Then I'll test it in my hands and see if I can spot the source of the issue.

May be an area in the function that I can improve messaging for this sort of issue in the future as well.

jpsmith5 commented 4 years ago

I'm thinking there's a missing expected column in the coverage files, but looking at the annotation file may give me some insight. If you'd also share thePEPPRO_log.md file that'd be helpful.

wongwic commented 4 years ago

Thanks for looking into this.

PEPPRO_log.txt Capsaspora_owczarzaki_atcc_30864_gca_000151315_annotation.txt

jpsmith5 commented 4 years ago

Hey @wongwic, would you be able to also share one of the input files to the frif plotting function. Maybe just the smallest of the following:

/local/storage/projects/NASA_2020/Capsaspora_r1/peppro/Capsaspora_r1/QC_Capsaspora_owczarzaki/Capsaspora_r1_CDS_plus_coverage.bed  
 /local/storage/projects/NASA_2020/Capsaspora_r1/peppro/Capsaspora_r1/QC_Capsaspora_owczarzaki/Capsaspora_r1_biological_region_plus_coverage.bed  
 /local/storage/projects/NASA_2020/Capsaspora_r1/peppro/Capsaspora_r1/QC_Capsaspora_owczarzaki/Capsaspora_r1_exon_plus_coverage.bed  
 /local/storage/projects/NASA_2020/Capsaspora_r1/peppro/Capsaspora_r1/QC_Capsaspora_owczarzaki/Capsaspora_r1_five_prime_UTR_plus_coverage.bed  
 /local/storage/projects/NASA_2020/Capsaspora_r1/peppro/Capsaspora_r1/QC_Capsaspora_owczarzaki/Capsaspora_r1_gene_plus_coverage.bed  
 /local/storage/projects/NASA_2020/Capsaspora_r1/peppro/Capsaspora_r1/QC_Capsaspora_owczarzaki/Capsaspora_r1_lnc_RNA_plus_coverage.bed  
 /local/storage/projects/NASA_2020/Capsaspora_r1/peppro/Capsaspora_r1/QC_Capsaspora_owczarzaki/Capsaspora_r1_mRNA_plus_coverage.bed  
 /local/storage/projects/NASA_2020/Capsaspora_r1/peppro/Capsaspora_r1/QC_Capsaspora_owczarzaki/Capsaspora_r1_ncRNA_plus_coverage.bed  
 /local/storage/projects/NASA_2020/Capsaspora_r1/peppro/Capsaspora_r1/QC_Capsaspora_owczarzaki/Capsaspora_r1_ncRNA_gene_plus_coverage.bed  
 /local/storage/projects/NASA_2020/Capsaspora_r1/peppro/Capsaspora_r1/QC_Capsaspora_owczarzaki/Capsaspora_r1_rRNA_plus_coverage.bed  
 /local/storage/projects/NASA_2020/Capsaspora_r1/peppro/Capsaspora_r1/QC_Capsaspora_owczarzaki/Capsaspora_r1_snRNA_plus_coverage.bed  
 /local/storage/projects/NASA_2020/Capsaspora_r1/peppro/Capsaspora_r1/QC_Capsaspora_owczarzaki/Capsaspora_r1_supercontig_plus_coverage.bed  
 /local/storage/projects/NASA_2020/Capsaspora_r1/peppro/Capsaspora_r1/QC_Capsaspora_owczarzaki/Capsaspora_r1_tRNA_plus_coverage.bed    
 /local/storage/projects/NASA_2020/Capsaspora_r1/peppro/Capsaspora_r1/QC_Capsaspora_owczarzaki/Capsaspora_r1_three_prime_UTR_plus_coverage.bed

If they're all of reasonable size, you could pass along all of them if it's not too much trouble, but I'd like to at least look at one of them to see what I'm working with.

wongwic commented 4 years ago

Capsaspora_bed.zip

jpsmith5 commented 4 years ago

Aha! Got it. A handful of the annotation regions/files/features have start positions of 0.

feature file zero count
Capsaspora_r1_ncRNA_gene_plus_coverage.bed 1
Capsaspora_r1_rRNA_plus_coverage.bed 1
Capsaspora_r1_supercontig_plus_coverage.bed 84
Capsaspora_r1_exon_plus_coverage.bed 1

Then, the problem arises with Capsaspora_r1_supercontig_plus_coverage.bed as the file is only 84 regions in total. This creates an issue where I had checked for 0's in rows, not expecting the start position to ever be zero. This is insignificant in the other files as it's only one row, but it's the entire file for the supercontig, thus creating an empty file.

I'm modifying on my side to handle this and will update again when I've released the new version. Thanks so much for working with me here.

jpsmith5 commented 4 years ago

Alright @wongwic. Check out the most recent release. Should address the issue you came across.

I was able to use your files to produce the following plots after the changes. frif cfrif

wongwic commented 4 years ago

Went well on my end. Thanks.