Open lakewoo opened 2 years ago
Plz try the command below in your environment to check bedtools :
bedtools getfasta -fi /Your_Path_To/genome.fa -bed PBMC3k.exonic.peaks.annotated.bed -s -split -name > PBMC3k.exonic.peaks.annotated.fa
We recommend bedtools 2.26.0 for SCAPTURE:
conda install -c biobuilds bedtools
If I can be of assistance, please do not hesitate to contact me.
Hi,
I ran into an issue when running PAScall with the test dataset PBMC3K. The DeepPASS.predict step throw an error list below. I got the outputs as PBMC3k.*.peaks.annotated.bed with just few lines, and then the program stopped.
Thank you for your help in advance!
##############
Here is the log file from "PBMC3K.exonic.peaks.DeepPASS.predict.log":
2022-10-19 13:04:06.688118: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2699950000 Hz 2022-10-19 13:04:06.693677: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55fd635f5ef0 executing computations on platform Host. Devices: 2022-10-19 13:04:06.693779: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Host, Default Version 2022-10-19 13:04:06.718258: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib:/usr/local/lib:/usr/local/gmp/5.1.1/lib64:/gpfs/tools/gsl/lib:/gpfs/tools/icu58/lib/:/gpfs/tools/anaconda/lib:/gpfs/tools/gsl/lib:/gpfs/tools/libiconv-1.15/lib 2022-10-19 13:04:06.718430: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (303) 2022-10-19 13:04:06.718510: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (amber): /proc/driver/nvidia/version does not exist
** Start Data Processing ** Traceback (most recent call last): File "/home/jt/gpfs_analyses/Hao/SCAPTURE/DeepPASS/Predict.py", line 69, in
dataset_pre = pd.read_csv(pre_f,sep='\t',index_col = False,header = None)
File "/gpfs/tools/anaconda/envs/SCAPTURE_env/lib/python3.7/site-packages/pandas/io/parsers.py", line 676, in parser_f
return _read(filepath_or_buffer, kwds)
File "/gpfs/tools/anaconda/envs/SCAPTURE_env/lib/python3.7/site-packages/pandas/io/parsers.py", line 448, in _read
parser = TextFileReader(fp_or_buf, kwds)
File "/gpfs/tools/anaconda/envs/SCAPTURE_env/lib/python3.7/site-packages/pandas/io/parsers.py", line 880, in init
self._make_engine(self.engine)
File "/gpfs/tools/anaconda/envs/SCAPTURE_env/lib/python3.7/site-packages/pandas/io/parsers.py", line 1114, in _make_engine
self._engine = CParserWrapper(self.f, self.options)
File "/gpfs/tools/anaconda/envs/SCAPTURE_env/lib/python3.7/site-packages/pandas/io/parsers.py", line 1891, in init
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 532, in pandas._libs.parsers.TextReader.cinit
pandas.errors.EmptyDataError: No columns to parse from file
And Here is the log file from PBMC3k.PAScall.log: scapture path: /home/SCAPTURE/ DeepPASS model file dir: /home/SCAPTURE//DeepPASS/ scapture module: PAScall Output prefix: PBMC3k prefix of annotation files from annotation module: ../annotation/SCAPTURE_annotation BAM file: ./PBMC3k.test.bam Fragment length: 98 GENOME file: ../annotation/GRCh38.p13.genome.fa Peak width: 400 OverlapRatio: 0.5 threads: 16 poly(a) database file: ../SupTab_KnownPASs_fourDBs.txt scapture PAScall: create command line. Wed Oct 19 18:14:26 PDT 2022 scapture PAScall: create command line done. Wed Oct 19 18:14:35 PDT 2022 scapture PAScall: peak calling. Wed Oct 19 18:14:35 PDT 2022 scapture PAScall: peak calling done. Wed Oct 19 18:17:07 PDT 2022 scapture PAScall: peak annotating. Wed Oct 19 18:17:07 PDT 2022 scapture PAScall: peak annotating done. Wed Oct 19 18:17:13 PDT 2022 scapture PAScall: PAS evaluating. Wed Oct 19 18:17:13 PDT 2022
Tool: bedtools getfasta (aka fastaFromBed) Version: v2.25.0 Summary: Extract DNA sequences into a fasta file based on feature coordinates.
Usage: bedtools getfasta [OPTIONS] -fi -bed <bed/gff/vcf> -fo
Options: -fi Input FASTA file -bed BED/GFF/VCF file of ranges to extract from -fi -fo Output file (can be FASTA or TAB-delimited) -name Use the name field for the FASTA header -split given BED12 fmt., extract and concatenate the sequencesfrom the BED "blocks" (e.g., exons) -tab Write output in TAB delimited format.
Default is FASTA format.
-s Force strandedness. If the feature occupies the antisense, strand, the sequence will be reverse complemented.
By default, strand information is ignored.
-fullHeader Use full fasta header.
cat: PBMC3k.exonic.peaks.DeepPASS.predictout/Predict_Result.txt: No such file or directory ..... cat: PBMC3k.intronic.peaks.DeepPASS.predictout/Predict_Result.txt: No such file or directory ... cat: PBMC3k.3primeExtended.peaks.DeepPASS.predictout/Predict_Result.txt: No such file or directory ***** ERROR: too many digits/characters for integer conversion in string . Exiting... scapture PAScall: output files -- PBMC3k.exonic.peaks.evaluated.bed PBMC3k.intronic.peaks.evaluated.bed PBMC3k.3primeExtended.peaks.evaluated.bed scapture PAScall: PAS evaluating done. Wed Oct 19 18:18:26 PDT 2022