CharlesJB / metagene

Artistic License 2.0
11 stars 11 forks source link

metagene analysis error in scan? #107

Closed sabznaz closed 6 years ago

sabznaz commented 6 years ago

Being new to metagene analysis i have already done DEA via edger

`library("metagene") setwd("C:/Users/nath/Documents/R/win-library/3.4/metagene/extdata")

bam_files <- c(system.file("extdata/RNmock.bam", package="metagene"), system.file("extdata/RN4hpi.bam", package="metagene"), system.file("extdata/RN8hpi.bam", package="metagene")) regions <- c(system.file("extdata/hrsvrna.bed", package="metagene")) mg <- metagene$new(regions = regions, bam_files = bam_files, assay = 'rnaseq', paired_end = TRUE) `

after entering the command i try to run it after like 20mins i get a error

Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 55 did not have 3 elements

the bam files were made using bowtie2 then used samtools to sort it from sam to bam format along with the bai files in the same directory

the bed file i created in notepad tab delimited its a .bed file the format is the following

chr11 66050729 66069308 chr2 264140 278283 chr11 18394388 18408425 chr10 120354701 120355183 chr1 633696 634376 chr4 41359607 41700044 chr6 33272010 33276510 chr21 29055805 29073797 chr20 49046246 49096960 chr19 2321517 2328620 chr19 12945855 12953642 chr19 12945855 12953642 chr19 18306230 18323274 chr20 62642445 62685785 chr5 135333900 135399914 chr5 177331562 177351852 chr11 10797050 10809110

i like to know what the error is is there a problem with the bed file that the metagene analysis is not recognising or how to resolve this. the bed file i have made has over 100 chromosomes with start and end points.

can you please help as i have been trying for over 2 weeks without any luck

ericfournier2 commented 6 years ago

Looks like a bed problem to me. Could you attach the bed file to the issue? Just drag and drop it to the comment box after renaming it to .txt

sabznaz commented 6 years ago

thanks for the quick response, I have attached the bed file let me know if theres a problem with it hrsv.txt

ericfournier2 commented 6 years ago

Your bed file had inconsistent separators. Everything in it should be separated by a tab. Try this one and see if it fixes the issue. hrsv.txt

sabznaz commented 6 years ago

that error has gone now im getting another error

Error in private$check_bam_levels(bam_file, regions, force_seqlevels = force_seqlevels) : Some seqlevels of regions are absent in bam_file

I have 3 files 1 file is control 1 file is 4hour post infection and 1 is 8hour post infection anyway to bypass this error? also what memory ram is ideal for carrying out metagene analysis as its taken nearly 2 hours then gave the error

ericfournier2 commented 6 years ago

Some levels in your regions (bed file, chr1, chr2, etc.) are not present in your bam files. This will often happen if you align your sequence against genome assembly that use a different chromosome naming schemes: for example, aligning against a UCSC genome, but using region definitions from an ENSEMBL genome.

You can ignore this error by setting the force_seqlevels parameter to TRUE, but be warned that if your region and bam files coordinate systems do not match up, then you will end up with a blank plot.

The amount of RAM needed to use metagene will be directly dependant on your input BAM files. I don't usually do non-trivial work on machines with less than 16MB of RAM, but YMMV.

sabznaz commented 6 years ago

i think thats the issue im having i had a prebuilt index which used ensembl when i did my DE analysis i got the top100 transcripts out from the analysis and then i made the bed file according to the top100 genes

I then downloaded mrna.fa from UCSC to make a bowtie index converted to .sam then .bam how would i get the top100 transcripts ID converted to UCSC from ensembl? I have tried biomart but no luck as of yet

the format of the top100 transcripts is something like this

ENST00000379412.9 ENST00000579374.5

Prior to that I converted ensembl gene ids into entrez for bed files.

the command which you gave earlier is this meant to be the coding for it as im still getting same error when trying to bypass

bam_files <- c(system.file("extdata/RNmock.bam", package="metagene"), system.file("extdata/RN4hpi.bam", package="metagene"), system.file("extdata/RN8hpi.bam", package="metagene")) regions <- c(system.file("extdata/hrsvnew.bed", package="metagene")) mg <- metagene$new(regions = regions, bam_files = bam_files, assay = 'rnaseq', force_seqlevels = TRUE)

Error in private$check_bam_levels(bam_file, regions, force_seqlevels = force_seqlevels) : No seqlevels matching between regions and bam file

Your help would be greatly appreciated.

ericfournier2 commented 6 years ago

Hello sabznaz,

if you made a new index from messenger RNAs and generated bam files against that, then that's what your bed files should refer to, rather than genomic coordinates. I don't know what the sequence names for your mrna.fa file are, so I can't quite help there.

I'm going to close this issue, since this is not a metagene problem, but a general programming/bioinformatics issue. I recommend looking to the bioconductor mailing list or Biostars for further help.