Open husamia opened 3 years ago
Also it seems the region feature data are not correct.
gzip: scotch-data-grch37/20.rfs.gz: not in gzip format
I've gotten as far as needing to use the genome annotations you provide, but for both GRCH37 and 38, the CHR.rfs.gz files are: a) Txt files, not gzipped. b) Contain no annotations. All files, for both builds contain the following: ---->more 6.rfs.gz
version https://git-lfs.github.com/spec/v1 oid sha256:8d512a5c2a7a9f11947f67c621720c1bba89349ff51f95450b06aa922a5a0339 size 763424410
I'm blocked from using your caller at this point as it is not clear from the documentation what format these RFS files need to take or I'd try to calculate them myself. Could you please either post instructions on this file format and/or please fix the data repos?
Thank You- John Major
For issue #1, it worked for me when I create a directory of bed files, one per chrm named 1.bed, 2.bed.... Each with the single entry of CHR START END for the entirre chrm If you have a new line at the end of the file, you'll surface another bug latter one that I fixed with this:
////getFeatures-getReadFeatures.py
for region in csv.reader(b, delimiter="\t"):
#add 1 since BED is 0-based
if len(region) == 0:
print('WARNING', bed, region, 'has length of zero, could be <EOF> or a bug if there is more than 1 \
warning...')
else:
regions.append([region[0], int(region[1]) + 1, int(region[2]) + 1])
I believe bed spec requires a new line at the end of the file, that was failing the assertion check however, so i added the if/else block above.
Another weird item.... compileFeatures.py had a first line "#!/ibin/bash" despite it being a python script. It only caused problems with emacs editing, but prob a small bug.
In closing- I'm blocked b/c there is no rfs data available, and am really eager to have this caller in consideration for the clinical WGS product I'm developing. But time is short for the investigative phase I'm in now, so I hope this might be resolved soon.
Thank you-- John Major
Ah! I think I sorted out the RFS file problem. You need to use git-lsf. This should be explained someplace in the repo. To get it to work i installed git-lfs, then moved to the dir containing the RFS.gz files and executed git-lfs pull. Which is not complete, but seems to be pulling down annotation files.
And specifically- this command needs to be run AFTER you fetch the RFS files python ~/scotch/scotch.py prepare-region-features --beds_dir=~/beds/ --all_rfs_dir=~/scotch-data/ --output_trim_rfs_dir=~/trim_rfs/
hi iamh2o , i'm struggling with this error. after i run this command python /home/elsh/scotch/scotch.py get-features-depth --project_dir=/home/elsh/ABC123/ --chrom=1 --beds_dir=/home/elsh/beds/ --fasta_ref=/home/elsh/refseq/hg19.fasta ( i just prefered to run the code for chr 1 then if all works fine i will use the boucle), i got this error awk: cmd. line:1: fatal: division by zero attempted Done. the output is 3 empty files : depth.feat.gz depth.feat.log depth.feat.stats. if you got to this point and it worked well , it will be nice if you can help Thank you
I am getting error about the bed file. The instructions don't specific how to generate it.
python scotch/scotch.py prepare-region-features --beds_dir=beds/ --all_rfs_dir=scotch-data-grch37/ --output_trim_rfs_dir=trim_rfs/
Traceback (most recent call last): File "scotch/scotch.py", line 423, in