francois-a / fastqtl

Enhanced version of the FastQTL QTL mapper
GNU General Public License v3.0
56 stars 22 forks source link

Bad BED format leading to fastQTL issues #25

Closed harrsha4 closed 2 years ago

harrsha4 commented 2 years ago

Hello, I have been having trouble making a viable BED file from R that can be recognized by bedtools for sorting.

Error Output: Unexpected file format. Please use tab-delimited BED, GFF, or VCF. Perhaps you have non-integer starts or ends at line 2

I am not sure if I can freely share the file as it contains data from a private database. The file had initial fields (#Chr, start, stop, TargetID). Target ID contains gene names (i.e. APOE) with the other fields having the corresponding information (Chr written as integer). After that, there are over 100 columns containing sample data (expression in decimal notation). The bed file was written with the following R code (On Windows):. As a safety measure, I applied dos2unix to the bed file before using bedtools.

write.table(object, file = "filename", quote = F, sep = "\t", row.names = F, col.names = T)

While bedtools is not recognizing my file, I am able to index it using tabix which is required for fastQTL (https://github.com/francois-a/fastqtl). However, I am unable to read the tabix file with the following error:

Failed to open file "filename.bed.gz.tbi" : Exec format error Couldn't understand format of "filename.bed.gz.tbi"

The bad tabix format makes in unable for me to use fastQTL, as I get the following error for all chunks used: Failed to get region 9:37753805-107690518 in [filename.bed.gz]

Coming back to bedtools, it seems to recognize the example file given with the fastQTL repository (examples folder: phenotypes.bed.gz) for the same sort function. As such, I am using bedtools as a type of testing mechanism to see if I am making a valid BED file.

Based on how I have prepared my BED file, are there any issues that would result in improper formatting? Thank you in advance, and please ask if you need any more info. I could probably share the expression data, but I need to check the guidelines of the repository before I do so (AMP-AD consortium).

I did try to switch over to tensorQTL to see if the formatting wasn't as big of an issue with that program, but I am unable to download it from the repository onto my cluster.

harrsha4 commented 2 years ago

Fixed it. For future reference, use dos2unix and then sed 's/ +//g' to reformat Windows newlines to Linux and remove unnecessary whitespace, respectively.