DivyaratanPopli / Kinship_Inference

This is a tool to estimate pairwise relatedness from ancient DNA, taking in account contamination, ROH, ascertainment bias.
GNU General Public License v3.0
7 stars 2 forks source link

Issues with KINgaroo #5

Closed szecsenyinagy closed 1 year ago

szecsenyinagy commented 1 year ago

Dear Divyaratan,

I created the suggested conda environment, and installed the programs within that. I used suggested input format for bed and bam. KIN works with the toy examples. However KINgaroo does not. It creates only some empty folders. –test 1 also does not work, writes:

(KIN) archeogen@archeogen-MS-7B17:/mnt/83a60764-36ca-41f4-b92d-3b3f715ba1a7/AvarJena/KIN$ KINgaroo -bam /mnt/83a60764-36ca-41f4-b92d-3b3f715ba1a7/AvarJena/KIN -bed 1240k_nochr_20221006.bed -T bamlist.txt -cnt 0 -test 1 usage: KINgaroo [-h] -bam -bed -T -cnt [-c] [-i] [-t] [-cest] [-d] [-tar] [-cont] [-r] [-p] KINgaroo: error: argument -t/--threshold: invalid int value: 'est'

I also see something similar in another github issue, and don’t have bam-rmdup from biohazard. Is that still needed? Here is a screenshot from the terminal with the conda packeges and the KINgaroo installation:

(python38) archeogen@archeogen-MS-7B17:~$ conda create -n KIN python=3.8 scipy=1.8.0 numpy=1.21.1 pandas=1.3.1 numba=0.55.1 pysam=0.19.0 pybedtools=0.9.0 Collecting package metadata (current_repodata.json): done Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source. Collecting package metadata (repodata.json): done Solving environment: done

Package Plan

environment location: /home/archeogen/anaconda3/envs/KIN

added / updated specs:

The following packages will be downloaded:

package                    |            build
---------------------------|-----------------
libdeflate-1.10            |       h7f98852_0          77 KB  conda-forge
libllvm11-11.1.0           |       he0ac6c6_4        28.8 MB  conda-forge
llvmlite-0.38.1            |   py38h38d86a4_0         2.3 MB  conda-forge
numba-0.55.1               |   py38hdc3674a_1         3.8 MB  conda-forge
numpy-1.21.1               |   py38h9894fe3_0         6.2 MB  conda-forge
pandas-1.3.1               |   py38h1abd341_0        13.0 MB  conda-forge
pysam-0.19.0               |   py38h8bf8b8d_0         2.7 MB  bioconda
scipy-1.8.0                |   py38h56a6a73_1        23.6 MB  conda-forge
setuptools-59.8.0          |   py38h578d9bd_1        1017 KB  conda-forge
------------------------------------------------------------
                                       Total:        81.5 MB

The following NEW packages will be INSTALLED:

_libgcc_mutex conda-forge/linux-64::_libgcc_mutex-0.1-conda_forge None _openmp_mutex conda-forge/linux-64::_openmp_mutex-4.5-2_gnu None bedtools bioconda/linux-64::bedtools-2.30.0-h468198e_3 None bzip2 conda-forge/linux-64::bzip2-1.0.8-h7f98852_4 None c-ares conda-forge/linux-64::c-ares-1.18.1-h7f98852_0 None ca-certificates conda-forge/linux-64::ca-certificates-2022.9.24-ha878542_0 None keyutils conda-forge/linux-64::keyutils-1.6.1-h166bdaf_0 None krb5 conda-forge/linux-64::krb5-1.19.3-h3790be6_0 None ld_impl_linux-64 conda-forge/linux-64::ld_impl_linux-64-2.36.1-hea4e1c9_2 None libblas conda-forge/linux-64::libblas-3.9.0-16_linux64_openblas None libcblas conda-forge/linux-64::libcblas-3.9.0-16_linux64_openblas None libcurl conda-forge/linux-64::libcurl-7.85.0-h7bff187_0 None libdeflate conda-forge/linux-64::libdeflate-1.10-h7f98852_0 None libedit conda-forge/linux-64::libedit-3.1.20191231-he28a2e2_2 None libev conda-forge/linux-64::libev-4.33-h516909a_1 None libffi conda-forge/linux-64::libffi-3.4.2-h7f98852_5 None libgcc-ng conda-forge/linux-64::libgcc-ng-12.1.0-h8d9b700_16 None libgfortran-ng conda-forge/linux-64::libgfortran-ng-12.1.0-h69a702a_16 None libgfortran5 conda-forge/linux-64::libgfortran5-12.1.0-hdcd56e2_16 None libgomp conda-forge/linux-64::libgomp-12.1.0-h8d9b700_16 None liblapack conda-forge/linux-64::liblapack-3.9.0-16_linux64_openblas None libllvm11 conda-forge/linux-64::libllvm11-11.1.0-he0ac6c6_4 None libnghttp2 conda-forge/linux-64::libnghttp2-1.47.0-hdcd2b5c_1 None libnsl conda-forge/linux-64::libnsl-2.0.0-h7f98852_0 None libopenblas conda-forge/linux-64::libopenblas-0.3.21-pthreads_h78a6416_3 None libsqlite conda-forge/linux-64::libsqlite-3.39.4-h753d276_0 None libssh2 conda-forge/linux-64::libssh2-1.10.0-haa6b8db_3 None libstdcxx-ng conda-forge/linux-64::libstdcxx-ng-12.1.0-ha89aaad_16 None libuuid conda-forge/linux-64::libuuid-2.32.1-h7f98852_1000 None libzlib conda-forge/linux-64::libzlib-1.2.12-h166bdaf_4 None llvmlite conda-forge/linux-64::llvmlite-0.38.1-py38h38d86a4_0 None ncurses conda-forge/linux-64::ncurses-6.3-h27087fc_1 None numba conda-forge/linux-64::numba-0.55.1-py38hdc3674a_1 None numpy conda-forge/linux-64::numpy-1.21.1-py38h9894fe3_0 None openssl conda-forge/linux-64::openssl-1.1.1q-h166bdaf_0 None pandas conda-forge/linux-64::pandas-1.3.1-py38h1abd341_0 None pip conda-forge/noarch::pip-22.2.2-pyhd8ed1ab_0 None pybedtools bioconda/linux-64::pybedtools-0.9.0-py38hf4f3596_1 None pysam bioconda/linux-64::pysam-0.19.0-py38h8bf8b8d_0 None python conda-forge/linux-64::python-3.8.13-h582c2e5_0_cpython None python-dateutil conda-forge/noarch::python-dateutil-2.8.2-pyhd8ed1ab_0 None python_abi conda-forge/linux-64::python_abi-3.8-2_cp38 None pytz conda-forge/noarch::pytz-2022.4-pyhd8ed1ab_0 None readline conda-forge/linux-64::readline-8.1.2-h0f457ee_0 None scipy conda-forge/linux-64::scipy-1.8.0-py38h56a6a73_1 None setuptools conda-forge/linux-64::setuptools-59.8.0-py38h578d9bd_1 None six conda-forge/noarch::six-1.16.0-pyh6c4a22f_0 None sqlite conda-forge/linux-64::sqlite-3.39.4-h4ff8645_0 None tk conda-forge/linux-64::tk-8.6.12-h27826a3_0 None wheel conda-forge/noarch::wheel-0.37.1-pyhd8ed1ab_0 None xz conda-forge/linux-64::xz-5.2.6-h166bdaf_0 None zlib conda-forge/linux-64::zlib-1.2.12-h166bdaf_4 None

Proceed ([y]/n)? y

Downloading and Extracting Packages scipy-1.8.0 | 23.6 MB | ################################################################################################################################################################# | 100% numba-0.55.1 | 3.8 MB | ################################################################################################################################################################# | 100% setuptools-59.8.0 | 1017 KB | ################################################################################################################################################################# | 100% numpy-1.21.1 | 6.2 MB | ################################################################################################################################################################# | 100% llvmlite-0.38.1 | 2.3 MB | ################################################################################################################################################################# | 100% pandas-1.3.1 | 13.0 MB | ################################################################################################################################################################# | 100% libdeflate-1.10 | 77 KB | ################################################################################################################################################################# | 100% libllvm11-11.1.0 | 28.8 MB | ################################################################################################################################################################# | 100% pysam-0.19.0 | 2.7 MB | ################################################################################################################################################################# | 100% Preparing transaction: done Verifying transaction: done Executing transaction: done #

To activate this environment, use

#

$ conda activate KIN

#

To deactivate an active environment, use

#

$ conda deactivate

Retrieving notices: ...working... done (python38) archeogen@archeogen-MS-7B17:~$ conda activate KIN

(KIN) archeogen@archeogen-MS-7B17:~$ pip3 install /home/archeogen/programok/Kinship_Inference-v3.1.2/DivyaratanPopli-Kinship_Inference-b3823c0/pypackage/kingaroo Processing ./programok/Kinship_Inference-v3.1.2/DivyaratanPopli-Kinship_Inference-b3823c0/pypackage/kingaroo Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... done Requirement already satisfied: pandas in ./anaconda3/envs/KIN/lib/python3.8/site-packages (from KINgaroo==0.1.0) (1.3.1) Requirement already satisfied: scipy in ./anaconda3/envs/KIN/lib/python3.8/site-packages (from KINgaroo==0.1.0) (1.8.0) Requirement already satisfied: pysam in ./anaconda3/envs/KIN/lib/python3.8/site-packages (from KINgaroo==0.1.0) (0.19.0) Requirement already satisfied: pybedtools in ./anaconda3/envs/KIN/lib/python3.8/site-packages (from KINgaroo==0.1.0) (0.9.0) Requirement already satisfied: numba in ./anaconda3/envs/KIN/lib/python3.8/site-packages (from KINgaroo==0.1.0) (0.55.1) Requirement already satisfied: numpy in ./anaconda3/envs/KIN/lib/python3.8/site-packages (from KINgaroo==0.1.0) (1.21.1) Requirement already satisfied: setuptools in ./anaconda3/envs/KIN/lib/python3.8/site-packages (from numba->KINgaroo==0.1.0) (59.8.0) Requirement already satisfied: llvmlite<0.39,>=0.38.0rc1 in ./anaconda3/envs/KIN/lib/python3.8/site-packages (from numba->KINgaroo==0.1.0) (0.38.1) Requirement already satisfied: python-dateutil>=2.7.3 in ./anaconda3/envs/KIN/lib/python3.8/site-packages (from pandas->KINgaroo==0.1.0) (2.8.2) Requirement already satisfied: pytz>=2017.3 in ./anaconda3/envs/KIN/lib/python3.8/site-packages (from pandas->KINgaroo==0.1.0) (2022.4) Requirement already satisfied: six in ./anaconda3/envs/KIN/lib/python3.8/site-packages (from pybedtools->KINgaroo==0.1.0) (1.16.0) Building wheels for collected packages: KINgaroo Building wheel for KINgaroo (pyproject.toml) ... done Created wheel for KINgaroo: filename=KINgaroo-0.1.0-py3-none-any.whl size=15947 sha256=a38255a00598badabe25e1e26d7644a885412e4cc43c61e6ecf3bd4c820dcbab Stored in directory: /tmp/pip-ephem-wheel-cache-h4bt986k/wheels/74/10/57/4a7c19927568803788c84e7a4be2402c4eb6fd5a4235d539ba Successfully built KINgaroo Installing collected packages: KINgaroo Successfully installed KINgaroo-0.1.0

(KIN) archeogen@archeogen-MS-7B17:~$ KINgaroo -bam /mnt/83a60764-36ca-41f4-b92d-3b3f715ba1a7/AvarJena/KIN -bed /mnt/83a60764-36ca-41f4-b92d-3b3f715ba1a7/AvarJena/KIN -T /mnt/83a60764-36ca-41f4-b92d-3b3f715ba1a7/AvarJena/KIN/bamlist.txt -cnt 0 Merging all chromosomes.. Creating input files from chromosome 1... Creating input files from chromosome 2... Creating input files from chromosome 3... Creating input files from chromosome 4... Creating input files from chromosome 5... Creating input files from chromosome 6... Creating input files from chromosome 7... Creating input files from chromosome 8... Creating input files from chromosome 9... Creating input files from chromosome 10... Creating input files from chromosome 11... Creating input files from chromosome 12... Creating input files from chromosome 13... Creating input files from chromosome 14... Creating input files from chromosome 15... Creating input files from chromosome 17... Creating input files from chromosome 16... Creating input files from chromosome 19... Creating input files from chromosome 18... Creating input files from chromosome 20... Creating input files from chromosome 21... Creating input files from chromosome 22... multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/archeogen/anaconda3/envs/KIN/lib/python3.8/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, *kwds)) File "/home/archeogen/anaconda3/envs/KIN/lib/python3.8/site-packages/KINgaroo/KINgaroo_scripts/helpers.py", line 119, in get_merged_chrm probs_list, pos_list, chrm_list = hapProbsAll(haplist=pslist, hap='noid') File "/home/archeogen/anaconda3/envs/KIN/lib/python3.8/site-packages/KINgaroo/KINgaroo_scripts/input_preparation_functions.py", line 96, in hapProbsAll x=pd.read_csv(fi, sep=",", header=0,index_col=0) File "/home/archeogen/anaconda3/envs/KIN/lib/python3.8/site-packages/pandas/util/_decorators.py", line 311, in wrapper return func(args, kwargs) File "/home/archeogen/anaconda3/envs/KIN/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 586, in read_csv return _read(filepath_or_buffer, kwds) File "/home/archeogen/anaconda3/envs/KIN/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 482, in _read parser = TextFileReader(filepath_or_buffer, kwds) File "/home/archeogen/anaconda3/envs/KIN/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 811, in init self._engine = self._make_engine(self.engine) File "/home/archeogen/anaconda3/envs/KIN/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1040, in _make_engine return mapping[engine](self.f, **self.options) # type: ignore[call-arg] File "/home/archeogen/anaconda3/envs/KIN/lib/python3.8/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 51, in init self._open_handles(src, kwds) File "/home/archeogen/anaconda3/envs/KIN/lib/python3.8/site-packages/pandas/io/parsers/base_parser.py", line 222, in _open_handles self.handles = get_handle( File "/home/archeogen/anaconda3/envs/KIN/lib/python3.8/site-packages/pandas/io/common.py", line 701, in get_handle handle = open( FileNotFoundError: [Errno 2] No such file or directory: 'hapProbs/hapProbs_HMU004.A0101.TF2_rmdup_chrm1_probs.csv' """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/archeogen/anaconda3/envs/KIN/bin/KINgaroo", line 8, in sys.exit(main.main()) File "/home/archeogen/anaconda3/envs/KIN/lib/python3.8/site-packages/KINgaroo/main.py", line 81, in main libraries, listf, dwins, twins, id_dwins, id_twins, chrmlist = hel.pipeline1(targetsfile = args.target_location, File "/home/archeogen/anaconda3/envs/KIN/lib/python3.8/site-packages/KINgaroo/KINgaroo_scripts/helpers.py", line 301, in pipeline1 dwins,twins,id_dwins,id_twins, chrmlist = parallel_mergedchrm(libraries=libraries, totalch=chrmf, interval=interval, cores=cores) File "/home/archeogen/anaconda3/envs/KIN/lib/python3.8/site-packages/KINgaroo/KINgaroo_scripts/helpers.py", line 139, in parallel_mergedchrm allf = [p.get() for p in res] File "/home/archeogen/anaconda3/envs/KIN/lib/python3.8/site-packages/KINgaroo/KINgaroo_scripts/helpers.py", line 139, in allf = [p.get() for p in res] File "/home/archeogen/anaconda3/envs/KIN/lib/python3.8/multiprocessing/pool.py", line 771, in get raise self._value FileNotFoundError: [Errno 2] No such file or directory: 'hapProbs/hapProbs_HMU004.A0101.TF2_rmdup_chrm1_probs.csv'

Thank you!

DivyaratanPopli commented 1 year ago

Hi, I think the issue may be with the format of bam files. Can you check that the chromosomes are represented by 1,2,.... etc. and not chr1, chr2,..etc.?

szecsenyinagy commented 1 year ago

Bam files look OK, no chr tag: @PG ID:bowtie2 PN:bowtie2 VN:2.4.4 CL:"/usr/bin/bowtie2-align-s --wrapper basic-0 --very-fast --local -t --met-file ./OUT_L//KLB002/ALNFILES/KLB002.metrics.log -p 20 -x /media/kjakab/Adat/PAPLINE_SCRIPTS/REFS/human/GRCh37.p13.genome --passthrough -U ./OUT_L//KLB002/MERGE/KLB002.t2.fastq.gz" @PG ID:samtools PN:samtools PP:bowtie2 VN:1.13 CL:samtools view -1 --threads 8 ./OUT_L//KLB002/ALNFILES/KLB002.bowtie2_evf.sam @PG ID:samtools.1 PN:samtools PP:samtools VN:1.13 CL:samtools sort --threads 8 -o ./OUT_L//KLB002/ALNFILES/KLB002.bowtie2_evf.sorted.bam ./OUT_L//KLB002/ALNFILES/KLB002.bowtie2_evf.bam @PG ID:samtools.2 PN:samtools PP:samtools.1 VN:1.13 CL:samtools view -h KLB002.bowtie2_local.sorted.dedup.trim_2_bp.bam @PG ID:samtools.3 PN:samtools PP:samtools.2 VN:1.13 CL:samtools view -h -b @PG ID:samtools.4 PN:samtools PP:samtools.3 VN:1.13 CL:samtools view -h KLB002.bowtie2_local.sorted.dedup.trim_2_bp.bam-.bam @PG ID:samtools.5 PN:samtools PP:samtools.4 VN:1.13 CL:samtools view -h -b K00233:266:HNN5VBBXY:4:2110:19522:44042 16 1 10061 1 7S39M 0 0 NNTCTATTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTNN !!JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJ!! AS:i:78 XS:i:78 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:39 YT:Z:UU K00233:266:HNN5VBBXY:3:1117:21775:6765 0 1 10062 1 44M6S 0 0 NNCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAATAGANN !!JFJFJJJFJJJJJ<AAJJJFJJJJJFAAJJFFFFAJFFFJJ<A-<7!! AS:i:88 XS:i:88 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:44 YT:Z:UU K00233:266:HNN5VBBXY:3:2106:14610:40244 16 1 10150 0 7S37M1I9M1S * 0 0 NNTCTATCTAACCCTAACCCTAACCCTAACCCTAACCTAAACCTAAACCCTAANN !!JJJJJJJJJJJJJJFJJJJJJJJJJJFJJJFJJJJJJJJJJJJJFJJJJJJ!! AS:i:76 XS:i:76 XN:i:0 XM:i:1 XO:i:1 XG:i:1 NM:i:2 MD:Z:33C12 YT:Z:UU ...........

bed file also OK: 1 752565 752566 G A 1 776545 776546 A G 1 832917 832918 T C 1 842012 842013 T G 1 846863 846864 G C 1 869302 869303 C T 1 891020 891021 G A 1 893461 893462 C T 1 896270 896271 C T.....

The same files were used by my colleague who run it successfully, so I am very curious about the issue here... KINgaroo makes the folders but they remain empty, and program exit after a few seconds. Thank you for your time!

DivyaratanPopli commented 1 year ago

Hi, it looks like the bed file is space-separated, and the package expects tab-separated bed file. May be that could be an issue.

szecsenyinagy commented 1 year ago

Sorry for not replying earlier, so the bed file is tab separated, and it worked well at my colleague, as well as the bam files are the same. I suspect here some other compatibility/system problem. Thanks a lot for any suggestion!

DivyaratanPopli commented 1 year ago

Hi, I was looking at the error you showed, and I see that there is "rmdup" in the name of the file not found. There was an issue with rmdup function earlier, but I removed this function from kingaroo. Maybe you're using an older version and if you just clone the github repository again, it should solve the issue.

ymeili commented 1 year ago

Hi, I was looking at the error you showed, and I see that there is "rmdup" in the name of the file not found. There was an issue with rmdup function earlier, but I removed this function from kingaroo. Maybe you're using an older version and if you just clone the github repository again, it should solve the issue.

I have the same error with the author, and I tried the solution you brought here, it does not work for me. Is there another possibilities for the error ? Thanks a lot.

DivyaratanPopli commented 1 year ago

Hi, if you see "rmdup" in the error, then this is because you're using an old version because the latest version does not have any rmdup operation.