brentp / peddy

genotype :: ped correspondence check, ancestry check, sex check. directly, quickly on VCF
MIT License
134 stars 39 forks source link

error loading tabix index #72

Closed ccymak closed 4 years ago

ccymak commented 4 years ago

May I ask if this is a problem with the vcf file?

peddy --plot -p $cpu --loglevel DEBUG --prefix tof99test /home/ccymak/tof_exome/TOF_Solexa_99.genotypecalls.vcf /home/ccymak/tof_exome/2018_SS-180814-01a/TOF_99.ped

Thanks


python3/3.6.4 is loaded 2019-10-24 15:47:00 paedbc01 peddy.cli[223643] INFO Running Peddy version 0.4.3 /home/ccymak/peddy/peddy/cli.py:198: FutureWarning: .ix is deprecated. Please use .loc for label based indexing or .iloc for positional indexing

See the documentation here: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated ped_df = ped_df.ix[samples, :] /home/ccymak/.local/lib/python3.6/site-packages/pandas/core/indexing.py:822: FutureWarning: .ix is deprecated. Please use .loc for label based indexing or .iloc for positional indexing

See the documentation here: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated retval = getattr(retval, self.name)._getitem_axis(key, axis=i) 2019-10-24 15:47:00 paedbc01 peddy.cli[223643] INFO ped_check multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/software/python/3.6.4/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "cyvcf2/cyvcf2.pyx", line 93, in cyvcf2.cyvcf2._par_relatedness File "cyvcf2/cyvcf2.pyx", line 783, in cyvcf2.cyvcf2.VCF._site_relatedness File "cyvcf2/cyvcf2.pyx", line 658, in gen_variants File "cyvcf2/cyvcf2.pyx", line 376, in call AssertionError: error loading tabix index for b'/home/ccymak/tof_exome/TOF_Solexa_99.genotypecalls.vcf' """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/ccymak/.local/bin/peddy", line 11, in load_entry_point('peddy', 'console_scripts', 'peddy')() File "/home/mullinyu/.local/lib/python3.6/site-packages/click/core.py", line 764, in call return self.main(args, kwargs) File "/home/mullinyu/.local/lib/python3.6/site-packages/click/core.py", line 717, in main rv = self.invoke(ctx) File "/home/mullinyu/.local/lib/python3.6/site-packages/click/core.py", line 956, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/mullinyu/.local/lib/python3.6/site-packages/click/core.py", line 555, in invoke return callback(args, kwargs) File "/home/ccymak/peddy/peddy/cli.py", line 209, in peddy in ("ped_check", "het_check", "sex_check")]): File "/home/ccymak/peddy/peddy/cli.py", line 43, in run prefix=prefix, kwargs) File "/home/ccymak/peddy/peddy/peddy.py", line 970, in ped_check min_depth=min_depth, each=each) File "cyvcf2/cyvcf2.pyx", line 39, in cyvcf2.cyvcf2.par_relatedness File "/software/python/3.6.4/lib/python3.6/multiprocessing/pool.py", line 735, in next raise value AssertionError: error loading tabix index for b'/home/ccymak/tof_exome/TOF_Solexa_99.genotypecalls.vcf'

brentp commented 4 years ago

yes, your vcf should be bgzipped and indexed.

ccymak commented 4 years ago

Dear Brent,

Despite putting in a vcf.gz file, these errors still occur. Is the .ix indexer the problem? Could it have anything to do with the python version?

Many Thanks

Christopher C Mak Department of Paediatrics and Adolescent Medicine LKS Faculty of Medicine The University of Kong Hong


python3/3.6.4 is loaded 2019-10-25 09:54:44 paedbc01 peddy.cli[407545] INFO Running Peddy version 0.4.3 /home/ccymak/peddy/peddy/cli.py:198: FutureWarning: .ix is deprecated. Please use .loc for label based indexing or .iloc for positional indexing

See the documentation here: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated ped_df = ped_df.ix[samples, :] /home/ccymak/.local/lib/python3.6/site-packages/pandas/core/indexing.py:822: FutureWarning: .ix is deprecated. Please use .loc for label based indexing or .iloc for positional indexing

See the documentation here: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated retval = getattr(retval, self.name)._getitem_axis(key, axis=i) 2019-10-25 09:54:44 paedbc01 peddy.cli[407545] INFO [1;31mped_check [0m 2019-10-25 09:54:50 paedbc01 peddy.peddy[407545] INFO plotting 2019-10-25 09:54:51 paedbc01 peddy.cli[407545] INFO ran in 7.7 seconds 2019-10-25 09:54:52 paedbc01 peddy.cli[407545] INFO [1;31mhet_check [0m 2019-10-25 09:54:58 paedbc01 peddy.pca[407545] INFO loaded and subsetted thousand-genomes genotypes (shape: (2504, 11496)) in 0.6 seconds /home/ccymak/.local/lib/python3.6/site-packages/sklearn/svm/base.py:193: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning. "avoid this warning.", FutureWarning) 2019-10-25 09:54:59 paedbc01 peddy.pca[407545] INFO ran randomized PCA on thousand-genomes samples at 11496 sites in 0.7 seconds 2019-10-25 09:54:59 paedbc01 peddy.pca[407545] INFO Projected thousand-genomes genotypes and sample genotypes and predicted ancestry via SVM in 0.2 seconds 2019-10-25 09:55:00 paedbc01 peddy.cli[407545] INFO ran in 8.2 seconds /home/ccymak/peddy/peddy/cli.py:224: FutureWarning: .ix is deprecated. Please use .loc for label based indexing or .iloc for positional indexing

See the documentation here: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated ped_df[col_name] = list(df[col].ix[samples]) 2019-10-25 09:55:00 paedbc01 peddy.cli[407545] INFO [1;31msex_check [0m no intervals found for b'/home/ccymak/tof_exome/TOF_Solexa_99.genotypecalls.vcf.gz' at X:2781480 2019-10-25 09:55:01 paedbc01 peddy.peddy[407545] INFO sex-check: 0 skipped / 10000 kept 2019-10-25 09:55:01 paedbc01 peddy.cli[407545] INFO ran in 1.3 seconds


On Thu, Oct 24, 2019 at 8:54 PM Brent Pedersen notifications@github.com wrote:

yes, your vcf should be bgzipped and indexed.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/brentp/peddy/issues/72?email_source=notifications&email_token=AC46UKV63N3EDOOFZP6B5BLQQGLGTA5CNFSM4JEQF3P2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECE5JUY#issuecomment-545903827, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC46UKQJA2SSTNN64WTXF3LQQGLGTANCNFSM4JEQF3PQ .

brentp commented 4 years ago

Hi, does your VCF file also have an index file (.csi or .tbi) ?

ccymak commented 4 years ago

Dear Brent,

Thanks so much for your reply. I have reperformed the indexing and it works fine now, I suppose I can just ignore the rest of the warnings?

For Relatedness check, am I right in saying that related samples will have a high IBS2 and a low IBS0 putting the sample in the top left above the other samples?

Thanks again for creating such a great tool!

Regards,

Christopher C Mak Department of Paediatrics and Adolescent Medicine LKS Faculty of Medicine The University of Kong Hong


2019-10-30 01:39:57 hpch01 peddy.cli[168547] INFO Running Peddy version 0.4.3 /home/ccymak/peddy/peddy/cli.py:198: FutureWarning: .ix is deprecated. Please use .loc for label based indexing or .iloc for positional indexing

See the documentation here: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated ped_df = ped_df.ix[samples, :] /home/ccymak/.local/lib/python3.6/site-packages/pandas/core/indexing.py:822: FutureWarning: .ix is deprecated. Please use .loc for label based indexing or .iloc for positional indexing

See the documentation here: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated retval = getattr(retval, self.name)._getitem_axis(key, axis=i) 2019-10-30 01:39:57 hpch01 peddy.cli[168547] INFO [1;31mped_check [0m 2019-10-30 01:40:10 hpch01 peddy.peddy[168547] INFO plotting 2019-10-30 01:40:12 hpch01 peddy.cli[168547] INFO ran in 14.6 seconds 2019-10-30 01:40:12 hpch01 peddy.cli[168547] INFO [1;31mhet_check [0m 2019-10-30 01:40:21 hpch01 peddy.pca[168547] INFO loaded and subsetted thousand-genomes genotypes (shape: (2504, 11496)) in 0.8 seconds /home/ccymak/.local/lib/python3.6/site-packages/sklearn/svm/base.py:193: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning. "avoid this warning.", FutureWarning) 2019-10-30 01:40:22 hpch01 peddy.pca[168547] INFO ran randomized PCA on thousand-genomes samples at 11496 sites in 1.1 seconds 2019-10-30 01:40:22 hpch01 peddy.pca[168547] INFO Projected thousand-genomes genotypes and sample genotypes and predicted ancestry via SVM in 0.2 seconds 2019-10-30 01:40:23 hpch01 peddy.cli[168547] INFO ran in 11.3 seconds /home/ccymak/peddy/peddy/cli.py:224: FutureWarning: .ix is deprecated. Please use .loc for label based indexing or .iloc for positional indexing

See the documentation here: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated ped_df[col_name] = list(df[col].ix[samples]) 2019-10-30 01:40:23 hpch01 peddy.cli[168547] INFO [1;31msex_check [0m no intervals found for b'/home/ccymak/tof_exome/TOF_Solexa_99.genotypecalls.vcf.gz' at X:2781480 2019-10-30 01:40:25 hpch01 peddy.peddy[168547] INFO sex-check: 0 skipped / 10000 kept 2019-10-30 01:40:25 hpch01 peddy.cli[168547] INFO ran in 2.1 seconds

On Mon, Oct 28, 2019 at 10:46 AM Brent Pedersen notifications@github.com wrote:

Hi, does your VCF file also have an index file (.csi or .tbi) ?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/brentp/peddy/issues/72?email_source=notifications&email_token=AC46UKTY27WF3457PQTOJYLQQZHAPA5CNFSM4JEQF3P2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECLQ3NI#issuecomment-546770357, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC46UKVQI5AA2ACTR4PZPXDQQZHAPANCNFSM4JEQF3PQ .

brentp commented 4 years ago

yes, that's correct. you can also change 1 axis to show relatedness which might be more easily interpreted.