francesccoll / powerbacgwas

PowerBacGWAS: Power calculations for Bacterial GWAS
https://github.com/francesccoll/powerbacgwas
GNU General Public License v3.0
11 stars 0 forks source link

Error Messages when Running Ancestral_State_Reconstruction_Roary.py and Kp.gwas_runs.sh #2

Open erin-thei opened 1 year ago

erin-thei commented 1 year ago

Hello,

Thank you for this tool. I am attempting to use this to do power calculations for pan-genome GWAS. These are the commands I have run:

python3 ./scripts/annotate_nodes_newick.py --input_tree RAxML_bipartitions.bootmap --output_tree tree.annotated.nwk

python3 ./scripts/roary_to_pastml_matrix.py --gene_presence_absence gene_presence_absence.Rtab --input_format 2 --output_table pastml.csv

python3 ./scripts/roary_to_plink_files.py --gene_presence_absence gene_presence_absence.Rtab --input_format 2 --output_prefix kp.pg

python3 ./scripts/ancestral_state_reconstruction_roary.py --input_pastml_table pastml.csv --input_tree tree.annotated.nwk --output_table ancestral.csv --output_steps ancestral_steps.csv --process 8 # this command produced an error

python3 ./pyseer/scripts/phylogeny_distance.py --calc-C --midpoint tree.annotated.nwk > tree_distances.csv

python3 ./scripts/prepare_gwas_runs_roary.py --roary_table gene_presence_absence.Rtab --input_format 2 --parameters_file paramters.binary.efs.txt --code_directory ./scripts/ --pyseer_path ./pyseer/pyseer-runner.py --similarity tree_distances.csv --plink_prefix kp.pg --pastml_steps_file ancestral_steps.csv --output_dir output_dir --output_prefix kp

bash kp.gwas_runs.sh

python3 ./scripts/process_gwas_runs.py --gwas_runs_in_table kp.gwas_runs.csv --variant_type r --output_dir ./output_dir/ --gwas_runs_out_table kp.ph.gwas_runs.results.csv

Rscript ./scripts/plot_gwas_runs.R --input_table kp.ph.gwas_runs.results.csv --parameters_file paramters.binary.efs.txt --plot_type 1 -v 12034 --output_plot kp.gwas_runs.results.plot.pdf

When I ran ancestral_state_reconstruction_roary.py I got some error messages about files not being found. I was still able to get the necessary outputs from this step, but I am wondering if these error messages affected the final plot. Here is the output from the command: ancestral_state_error.txt

Additionally, when I ran 'bash kp.gwas_runs.sh' I received another set of errors. Again, the script did not exit so I was able to obtain the necessary outputs, but I am wondering what these error messages are and how they may have affected the power calculations.

This is a truncated version of the error message when running kp.gwas_runs.sh as the full output is too large. I can send you the entire error message, if necessary. Just trying to understand these errors and if/how they affect the final power calculations. Thanks!

No observations of group_5941 in selected samples No observations of group_2493 in selected samples 10722 loaded variants 4093 pre-filtered variants 6629 tested variants 10722 printed variants 2023-02-06 11:21:04,013 INFO: Saving number of homoplasies (steps) per gene... 2023-02-06 11:21:04,016 INFO: Opening gene_presence_absence.Rtab file and calculating gene frequency... 2023-02-06 11:21:04,123 INFO: Opening gene_presence_absence file and calculating gene frequency. DONE. 2023-02-06 11:21:04,123 INFO: Selecting variants meeting criteria... 2023-02-06 11:21:04,132 INFO: Selecting variants meeting criteria. DONE. 2023-02-06 11:21:04,132 INFO: Randomly sampling of variants meeting criteria. 2023-02-06 11:21:04,547 INFO: Reading causal variant file /scr1/users/theillere/usda37/powerbacgwas2/output_dir/HWFRVODMREFP.causal_variants.txt... 2023-02-06 11:21:04,548 INFO: Opening gene_presence_absence.Rtab file and calculating gene frequency... 2023-02-06 11:21:04,594 INFO: Opening gene_presence_absence file and calculating gene frequency. DONE. 2023-02-06 11:21:04,594 INFO: Extracting samples with and without causal variant(s) (mutated and wild-type)... 2023-02-06 11:21:04,594 INFO: Calculating number of cases and controls, with and without the causal variant, to achieve the chosen odds ratio... Traceback (most recent call last): File "/scr1/users/theillere/usda37/powerbacgwas2/scripts/simulate_binary_phenotype_roary.py", line 406, in _main() File "/scr1/users/theillere/usda37/powerbacgwas2/scripts/simulate_binary_phenotype_roary.py", line 358, in _main args=(len(roary_samples_mut), len(roary_samples_wt), float(args.allele_frequency), int(sample_size), float(odds_ratio)), ValueError: could not convert string to float: 'NA' mv: cannot stat '/scr1/users/theillere/usda37/powerbacgwas2/output_dir/HWFRVODMREFP.phen': No such file or directory Traceback (most recent call last): File "./pyseer/pyseer-runner.py", line 8, in main() File "/scr1/users/theillere/usda37/powerbacgwas2/pyseer/pyseer/main.py", line 305, in main p = load_phenotypes(options.phenotypes, options.phenotype_column) File "/scr1/users/theillere/usda37/powerbacgwas2/pyseer/pyseer/input.py", line 37, in load_phenotypes p = pd.read_csv(infile, index_col=0, sep='\t') File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 688, in read_csv return _read(filepath_or_buffer, kwds) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 454, in _read parser = TextFileReader(fp_or_buf, kwds) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 948, in init self._make_engine(self.engine) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 1180, in _make_engine self._engine = CParserWrapper(self.f, self.options) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 2010, in init self._reader = parsers.TextReader(src, kwds) File "pandas/_libs/parsers.pyx", line 382, in pandas._libs.parsers.TextReader.cinit File "pandas/_libs/parsers.pyx", line 674, in pandas._libs.parsers.TextReader._setup_parser_source FileNotFoundError: [Errno 2] No such file or directory: '/scr1/users/theillere/usda37/powerbacgwas2/output_dir/HWFRVODMREFP.pyseer.phen' Traceback (most recent call last): File "./pyseer/pyseer-runner.py", line 8, in main() File "/scr1/users/theillere/usda37/powerbacgwas2/pyseer/pyseer/main.py", line 305, in main p = load_phenotypes(options.phenotypes, options.phenotype_column) File "/scr1/users/theillere/usda37/powerbacgwas2/pyseer/pyseer/input.py", line 37, in load_phenotypes p = pd.read_csv(infile, index_col=0, sep='\t') File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 688, in read_csv return _read(filepath_or_buffer, kwds) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 454, in _read parser = TextFileReader(fp_or_buf, kwds) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 948, in init self._make_engine(self.engine) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 1180, in _make_engine self._engine = CParserWrapper(self.f, self.options) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 2010, in init self._reader = parsers.TextReader(src, kwds) File "pandas/_libs/parsers.pyx", line 382, in pandas._libs.parsers.TextReader.cinit File "pandas/_libs/parsers.pyx", line 674, in pandas._libs.parsers.TextReader._setup_parser_source FileNotFoundError: [Errno 2] No such file or directory: '/scr1/users/theillere/usda37/powerbacgwas2/output_dir/HWFRVODMREFP.pyseer.phen' Traceback (most recent call last): File "./pyseer/pyseer-runner.py", line 8, in main() File "/scr1/users/theillere/usda37/powerbacgwas2/pyseer/pyseer/main.py", line 305, in main p = load_phenotypes(options.phenotypes, options.phenotype_column) File "/scr1/users/theillere/usda37/powerbacgwas2/pyseer/pyseer/input.py", line 37, in load_phenotypes p = pd.read_csv(infile, index_col=0, sep='\t') File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 688, in read_csv return _read(filepath_or_buffer, kwds) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 454, in _read parser = TextFileReader(fp_or_buf, kwds) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 948, in init self._make_engine(self.engine) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 1180, in _make_engine self._engine = CParserWrapper(self.f, self.options) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 2010, in init self._reader = parsers.TextReader(src, kwds) File "pandas/_libs/parsers.pyx", line 382, in pandas._libs.parsers.TextReader.cinit File "pandas/_libs/parsers.pyx", line 674, in pandas._libs.parsers.TextReader._setup_parser_source FileNotFoundError: [Errno 2] No such file or directory: '/scr1/users/theillere/usda37/powerbacgwas2/output_dir/HWFRVODMREFP.pyseer.phen' Traceback (most recent call last): File "./pyseer/pyseer-runner.py", line 8, in main() File "/scr1/users/theillere/usda37/powerbacgwas2/pyseer/pyseer/main.py", line 305, in main p = load_phenotypes(options.phenotypes, options.phenotype_column) File "/scr1/users/theillere/usda37/powerbacgwas2/pyseer/pyseer/input.py", line 37, in load_phenotypes p = pd.read_csv(infile, index_col=0, sep='\t') File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 688, in read_csv return _read(filepath_or_buffer, kwds) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 454, in _read parser = TextFileReader(fp_or_buf, kwds) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 948, in init self._make_engine(self.engine) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 1180, in _make_engine self._engine = CParserWrapper(self.f, self.options) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 2010, in init self._reader = parsers.TextReader(src, kwds) File "pandas/_libs/parsers.pyx", line 382, in pandas._libs.parsers.TextReader.cinit File "pandas/_libs/parsers.pyx", line 674, in pandas._libs.parsers.TextReader._setup_parser_source FileNotFoundError: [Errno 2] No such file or directory: '/scr1/users/theillere/usda37/powerbacgwas2/output_dir/HWFRVODMREFP.pyseer.phen' Traceback (most recent call last): File "./pyseer/pyseer-runner.py", line 8, in main() File "/scr1/users/theillere/usda37/powerbacgwas2/pyseer/pyseer/main.py", line 305, in main p = load_phenotypes(options.phenotypes, options.phenotype_column) File "/scr1/users/theillere/usda37/powerbacgwas2/pyseer/pyseer/input.py", line 37, in load_phenotypes p = pd.read_csv(infile, index_col=0, sep='\t') File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 688, in read_csv return _read(filepath_or_buffer, kwds) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 454, in _read parser = TextFileReader(fp_or_buf, kwds) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 948, in init self._make_engine(self.engine) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 1180, in _make_engine self._engine = CParserWrapper(self.f, self.options) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 2010, in init self._reader = parsers.TextReader(src, kwds) File "pandas/_libs/parsers.pyx", line 382, in pandas._libs.parsers.TextReader.cinit File "pandas/_libs/parsers.pyx", line 674, in pandas._libs.parsers.TextReader._setup_parser_source FileNotFoundError: [Errno 2] No such file or directory: '/scr1/users/theillere/usda37/powerbacgwas2/output_dir/HWFRVODMREFP.pyseer.phen' 2023-02-06 11:21:09,910 INFO: Saving number of homoplasies (steps) per gene... 2023-02-06 11:21:09,913 INFO: Opening gene_presence_absence.Rtab file and calculating gene frequency... 2023-02-06 11:21:10,000 INFO: Opening gene_presence_absence file and calculating gene frequency. DONE. 2023-02-06 11:21:10,000 INFO: Selecting variants meeting criteria... 2023-02-06 11:21:10,009 INFO: Selecting variants meeting criteria. DONE. 2023-02-06 11:21:10,009 INFO: Randomly sampling of variants meeting criteria. 2023-02-06 11:21:10,430 INFO: Reading causal variant file /scr1/users/theillere/usda37/powerbacgwas2/output_dir/KNMUBUQIHASD.causal_variants.txt... 2023-02-06 11:21:10,431 INFO: Opening gene_presence_absence.Rtab file and calculating gene frequency... 2023-02-06 11:21:10,476 INFO: Opening gene_presence_absence file and calculating gene frequency. DONE. 2023-02-06 11:21:10,476 INFO: Extracting samples with and without causal variant(s) (mutated and wild-type)... 2023-02-06 11:21:10,477 INFO: Calculating number of cases and controls, with and without the causal variant, to achieve the chosen odds ratio... Traceback (most recent call last): File "/scr1/users/theillere/usda37/powerbacgwas2/scripts/simulate_binary_phenotype_roary.py", line 406, in _main() File "/scr1/users/theillere/usda37/powerbacgwas2/scripts/simulate_binary_phenotype_roary.py", line 358, in _main args=(len(roary_samples_mut), len(roary_samples_wt), float(args.allele_frequency), int(sample_size), float(odds_ratio)), ValueError: could not convert string to float: 'NA' mv: cannot stat '/scr1/users/theillere/usda37/powerbacgwas2/output_dir/KNMUBUQIHASD.phen': No such file or directory Traceback (most recent call last): File "./pyseer/pyseer-runner.py", line 8, in main() File "/scr1/users/theillere/usda37/powerbacgwas2/pyseer/pyseer/main.py", line 305, in main p = load_phenotypes(options.phenotypes, options.phenotype_column) File "/scr1/users/theillere/usda37/powerbacgwas2/pyseer/pyseer/input.py", line 37, in load_phenotypes p = pd.read_csv(infile, index_col=0, sep='\t') File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 688, in read_csv return _read(filepath_or_buffer, kwds) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 454, in _read parser = TextFileReader(fp_or_buf, kwds) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 948, in init self._make_engine(self.engine) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 1180, in _make_engine self._engine = CParserWrapper(self.f, self.options) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 2010, in init self._reader = parsers.TextReader(src, kwds) File "pandas/_libs/parsers.pyx", line 382, in pandas._libs.parsers.TextReader.cinit File "pandas/_libs/parsers.pyx", line 674, in pandas._libs.parsers.TextReader._setup_parser_source FileNotFoundError: [Errno 2] No such file or directory: '/scr1/users/theillere/usda37/powerbacgwas2/output_dir/KNMUBUQIHASD.pyseer.phen' Traceback (most recent call last): File "./pyseer/pyseer-runner.py", line 8, in main() File "/scr1/users/theillere/usda37/powerbacgwas2/pyseer/pyseer/main.py", line 305, in main p = load_phenotypes(options.phenotypes, options.phenotype_column) File "/scr1/users/theillere/usda37/powerbacgwas2/pyseer/pyseer/input.py", line 37, in load_phenotypes p = pd.read_csv(infile, index_col=0, sep='\t') File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 688, in read_csv return _read(filepath_or_buffer, kwds) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 454, in _read parser = TextFileReader(fp_or_buf, kwds) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 948, in init self._make_engine(self.engine) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 1180, in _make_engine self._engine = CParserWrapper(self.f, self.options) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 2010, in init self._reader = parsers.TextReader(src, kwds) File "pandas/_libs/parsers.pyx", line 382, in pandas._libs.parsers.TextReader.cinit File "pandas/_libs/parsers.pyx", line 674, in pandas._libs.parsers.TextReader._setup_parser_source FileNotFoundError: [Errno 2] No such file or directory: '/scr1/users/theillere/usda37/powerbacgwas2/output_dir/KNMUBUQIHASD.pyseer.phen' Traceback (most recent call last): File "./pyseer/pyseer-runner.py", line 8, in main() File "/scr1/users/theillere/usda37/powerbacgwas2/pyseer/pyseer/main.py", line 305, in main p = load_phenotypes(options.phenotypes, options.phenotype_column) File "/scr1/users/theillere/usda37/powerbacgwas2/pyseer/pyseer/input.py", line 37, in load_phenotypes p = pd.read_csv(infile, index_col=0, sep='\t') File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 688, in read_csv return _read(filepath_or_buffer, kwds) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 454, in _read parser = TextFileReader(fp_or_buf, kwds) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 948, in init self._make_engine(self.engine) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 1180, in _make_engine self._engine = CParserWrapper(self.f, self.options) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 2010, in init self._reader = parsers.TextReader(src, kwds) File "pandas/_libs/parsers.pyx", line 382, in pandas._libs.parsers.TextReader.cinit File "pandas/_libs/parsers.pyx", line 674, in pandas._libs.parsers.TextReader._setup_parser_source FileNotFoundError: [Errno 2] No such file or directory: '/scr1/users/theillere/usda37/powerbacgwas2/output_dir/KNMUBUQIHASD.pyseer.phen' Traceback (most recent call last): File "./pyseer/pyseer-runner.py", line 8, in main() File "/scr1/users/theillere/usda37/powerbacgwas2/pyseer/pyseer/main.py", line 305, in main p = load_phenotypes(options.phenotypes, options.phenotype_column) File "/scr1/users/theillere/usda37/powerbacgwas2/pyseer/pyseer/input.py", line 37, in load_phenotypes p = pd.read_csv(infile, index_col=0, sep='\t') File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 688, in read_csv return _read(filepath_or_buffer, kwds) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 454, in _read parser = TextFileReader(fp_or_buf, kwds) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 948, in init self._make_engine(self.engine) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 1180, in _make_engine self._engine = CParserWrapper(self.f, self.options) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 2010, in init self._reader = parsers.TextReader(src, kwds) File "pandas/_libs/parsers.pyx", line 382, in pandas._libs.parsers.TextReader.cinit File "pandas/_libs/parsers.pyx", line 674, in pandas._libs.parsers.TextReader._setup_parser_source FileNotFoundError: [Errno 2] No such file or directory: '/scr1/users/theillere/usda37/powerbacgwas2/output_dir/KNMUBUQIHASD.pyseer.phen' Traceback (most recent call last): File "./pyseer/pyseer-runner.py", line 8, in main() File "/scr1/users/theillere/usda37/powerbacgwas2/pyseer/pyseer/main.py", line 305, in main p = load_phenotypes(options.phenotypes, options.phenotype_column) File "/scr1/users/theillere/usda37/powerbacgwas2/pyseer/pyseer/input.py", line 37, in load_phenotypes p = pd.read_csv(infile, index_col=0, sep='\t') File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 688, in read_csv return _read(filepath_or_buffer, kwds) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 454, in _read parser = TextFileReader(fp_or_buf, kwds) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 948, in init self._make_engine(self.engine) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 1180, in _make_engine self._engine = CParserWrapper(self.f, self.options) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 2010, in init self._reader = parsers.TextReader(src, kwds) File "pandas/_libs/parsers.pyx", line 382, in pandas._libs.parsers.TextReader.cinit File "pandas/_libs/parsers.pyx", line 674, in pandas._libs.parsers.TextReader._setup_parser_source FileNotFoundError: [Errno 2] No such file or directory: '/scr1/users/theillere/usda37/powerbacgwas2/output_dir/KNMUBUQIHASD.pyseer.phen' 2023-02-06 11:21:15,844 INFO: Saving number of homoplasies (steps) per gene... 2023-02-06 11:21:15,847 INFO: Opening gene_presence_absence.Rtab file and calculating gene frequency... 2023-02-06 11:21:15,935 INFO: Opening gene_presence_absence file and calculating gene frequency. DONE. 2023-02-06 11:21:15,935 INFO: Selecting variants meeting criteria... 2023-02-06 11:21:15,944 INFO: Selecting variants meeting criteria. DONE. 2023-02-06 11:21:15,944 INFO: Randomly sampling of variants meeting criteria. 2023-02-06 11:21:16,354 INFO: Reading causal variant file /scr1/users/theillere/usda37/powerbacgwas2/output_dir/XPZEDQRQDHGK.causal_variants.txt... 2023-02-06 11:21:16,355 INFO: Opening gene_presence_absence.Rtab file and calculating gene frequency... 2023-02-06 11:21:16,401 INFO: Opening gene_presence_absence file and calculating gene frequency. DONE. 2023-02-06 11:21:16,401 INFO: Extracting samples with and without causal variant(s) (mutated and wild-type)... 2023-02-06 11:21:16,401 INFO: Calculating number of cases and controls, with and without the causal variant, to achieve the chosen odds ratio... Traceback (most recent call last): File "/scr1/users/theillere/usda37/powerbacgwas2/scripts/simulate_binary_phenotype_roary.py", line 406, in _main() File "/scr1/users/theillere/usda37/powerbacgwas2/scripts/simulate_binary_phenotype_roary.py", line 358, in _main args=(len(roary_samples_mut), len(roary_samples_wt), float(args.allele_frequency), int(sample_size), float(odds_ratio)), ValueError: could not convert string to float: 'NA' mv: cannot stat '/scr1/users/theillere/usda37/powerbacgwas2/output_dir/XPZEDQRQDHGK.phen': No such file or directory Traceback (most recent call last): File "./pyseer/pyseer-runner.py", line 8, in main() File "/scr1/users/theillere/usda37/powerbacgwas2/pyseer/pyseer/main.py", line 305, in main p = load_phenotypes(options.phenotypes, options.phenotype_column) File "/scr1/users/theillere/usda37/powerbacgwas2/pyseer/pyseer/input.py", line 37, in load_phenotypes p = pd.read_csv(infile, index_col=0, sep='\t') File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 688, in read_csv return _read(filepath_or_buffer, kwds) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 454, in _read parser = TextFileReader(fp_or_buf, kwds) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 948, in init self._make_engine(self.engine) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 1180, in _make_engine self._engine = CParserWrapper(self.f, self.options) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 2010, in init self._reader = parsers.TextReader(src, kwds) File "pandas/_libs/parsers.pyx", line 382, in pandas._libs.parsers.TextReader.cinit File "pandas/_libs/parsers.pyx", line 674, in pandas._libs.parsers.TextReader._setup_parser_source FileNotFoundError: [Errno 2] No such file or directory: '/scr1/users/theillere/usda37/powerbacgwas2/output_dir/XPZEDQRQDHGK.pyseer.phen' Traceback (most recent call last): File "./pyseer/pyseer-runner.py", line 8, in main() File "/scr1/users/theillere/usda37/powerbacgwas2/pyseer/pyseer/main.py", line 305, in main p = load_phenotypes(options.phenotypes, options.phenotype_column) File "/scr1/users/theillere/usda37/powerbacgwas2/pyseer/pyseer/input.py", line 37, in load_phenotypes p = pd.read_csv(infile, index_col=0, sep='\t') File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 688, in read_csv return _read(filepath_or_buffer, kwds) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 454, in _read parser = TextFileReader(fp_or_buf, kwds) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 948, in init self._make_engine(self.engine) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 1180, in _make_engine self._engine = CParserWrapper(self.f, self.options) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 2010, in init self._reader = parsers.TextReader(src, kwds) File "pandas/_libs/parsers.pyx", line 382, in pandas._libs.parsers.TextReader.cinit File "pandas/_libs/parsers.pyx", line 674, in pandas._libs.parsers.TextReader._setup_parser_source FileNotFoundError: [Errno 2] No such file or directory: '/scr1/users/theillere/usda37/powerbacgwas2/output_dir/XPZEDQRQDHGK.pyseer.phen' Traceback (most recent call last): File "./pyseer/pyseer-runner.py", line 8, in main() File "/scr1/users/theillere/usda37/powerbacgwas2/pyseer/pyseer/main.py", line 305, in main p = load_phenotypes(options.phenotypes, options.phenotype_column) File "/scr1/users/theillere/usda37/powerbacgwas2/pyseer/pyseer/input.py", line 37, in load_phenotypes p = pd.read_csv(infile, index_col=0, sep='\t') File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 688, in read_csv return _read(filepath_or_buffer, kwds) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 454, in _read parser = TextFileReader(fp_or_buf, kwds) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 948, in init self._make_engine(self.engine) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 1180, in _make_engine self._engine = CParserWrapper(self.f, self.options) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 2010, in init self._reader = parsers.TextReader(src, kwds) File "pandas/_libs/parsers.pyx", line 382, in pandas._libs.parsers.TextReader.cinit File "pandas/_libs/parsers.pyx", line 674, in pandas._libs.parsers.TextReader._setup_parser_source FileNotFoundError: [Errno 2] No such file or directory: '/scr1/users/theillere/usda37/powerbacgwas2/output_dir/XPZEDQRQDHGK.pyseer.phen' Traceback (most recent call last): File "./pyseer/pyseer-runner.py", line 8, in main() File "/scr1/users/theillere/usda37/powerbacgwas2/pyseer/pyseer/main.py", line 305, in main p = load_phenotypes(options.phenotypes, options.phenotype_column) File "/scr1/users/theillere/usda37/powerbacgwas2/pyseer/pyseer/input.py", line 37, in load_phenotypes p = pd.read_csv(infile, index_col=0, sep='\t') File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 688, in read_csv return _read(filepath_or_buffer, kwds) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 454, in _read parser = TextFileReader(fp_or_buf, kwds) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 948, in init self._make_engine(self.engine) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 1180, in _make_engine self._engine = CParserWrapper(self.f, self.options) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 2010, in init self._reader = parsers.TextReader(src, kwds) File "pandas/_libs/parsers.pyx", line 382, in pandas._libs.parsers.TextReader.cinit File "pandas/_libs/parsers.pyx", line 674, in pandas._libs.parsers.TextReader._setup_parser_source FileNotFoundError: [Errno 2] No such file or directory: '/scr1/users/theillere/usda37/powerbacgwas2/output_dir/XPZEDQRQDHGK.pyseer.phen' Traceback (most recent call last): File "./pyseer/pyseer-runner.py", line 8, in main() File "/scr1/users/theillere/usda37/powerbacgwas2/pyseer/pyseer/main.py", line 305, in main p = load_phenotypes(options.phenotypes, options.phenotype_column) File "/scr1/users/theillere/usda37/powerbacgwas2/pyseer/pyseer/input.py", line 37, in load_phenotypes p = pd.read_csv(infile, index_col=0, sep='\t') File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 688, in read_csv return _read(filepath_or_buffer, kwds) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 454, in _read parser = TextFileReader(fp_or_buf, kwds) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 948, in init self._make_engine(self.engine) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 1180, in _make_engine self._engine = CParserWrapper(self.f, self.options) File "/home/theillere/miniconda3/envs/powerbacgwas/lib/python3.6/site-packages/pandas/io/parsers.py", line 2010, in init self._reader = parsers.TextReader(src, **kwds) File "pandas/_libs/parsers.pyx", line 382, in pandas._libs.parsers.TextReader.cinit File "pandas/_libs/parsers.pyx", line 674, in pandas._libs.parsers.TextReader._setup_parser_source FileNotFoundError: [Errno 2] No such file or directory: '/scr1/users/theillere/usda37/powerbacgwas2/output_dir/XPZEDQRQDHGK.pyseer.phen' 2023-02-06 11:21:21,694 INFO: Saving number of homoplasies (steps) per gene... 2023-02-06 11:21:21,698 INFO: Opening gene_presence_absence.Rtab file and calculating gene frequency... 2023-02-06 11:21:21,786 INFO: Opening gene_presence_absence file and calculating gene frequency. DONE. 2023-02-06 11:21:21,786 INFO: Selecting variants meeting criteria... 2023-02-06 11:21:21,795 INFO: Selecting variants meeting criteria. DONE. 2023-02-06 11:21:21,795 INFO: Randomly sampling of variants meeting criteria. 2023-02-06 11:21:22,211 INFO: Reading causal variant file /scr1/users/theillere/usda37/powerbacgwas2/output_dir/NRRUMMWMXBFS.causal_variants.txt... 2023-02-06 11:21:22,212 INFO: Opening gene_presence_absence.Rtab file and calculating gene frequency... 2023-02-06 11:21:22,257 INFO: Opening gene_presence_absence file and calculating gene frequency. DONE. 2023-02-06 11:21:22,257 INFO: Extracting samples with and without causal variant(s) (mutated and wild-type)... 2023-02-06 11:21:22,258 INFO: Calculating number of cases and controls, with and without the causal variant, to achieve the chosen odds ratio...

francesccoll commented 1 year ago

Hi,

Thanks for using PowerBacGWAS for your research.

Would you be able to share the original phylogenetic tree file (RAxML_bipartitions.bootmap) and pan-genome table (gene_presence_absence.Rtab)? I can give it a go to see if I can reproduce this error.

You may also want to try using the Docker/Nextflow implementation to see if you get the same error (just to rule out dependencies issues).

erin-thei commented 1 year ago

Thanks for the quick reply! I've attached the two input files I used. As for trying the Docker/Nextflow implementation, I've been having issues with using Nextflow on our cluster. I'm trying to work through them, but needed a quicker option - which is why I opted to use the individual commands.

powerbacGWAS_input_files.zip

francesccoll commented 1 year ago

It looks as if it was not an issue with dependencies but with the format of the input file (gene_presence_absence.Rtab), specifically with the symbol ~ in the gene names which made pastml crash. After editing the input pan-genome file: cat gene_presence_absence.Rtab | sed 's/~/_/g' > gene_presence_absence.edited.Rtab the command: python3 ./scripts/ancestral_state_reconstruction_roary.py --input_pastml_table pastml.csv --input_tree tree.annotated.nwk --output_table ancestral.csv --output_steps ancestral_steps.csv --process 8 run without errors. Make sure all PowerBacGWAS commands before this one are run with the edited input file without ~ symbols. The output table ancestral_steps.csv should have the same number of lines as the input file gene_presence_absence.edited.Rtab

erin-thei commented 1 year ago

Great, thanks so much - that worked for the ancestral_state_reconstruction_roary.py step! It was able to produce the csv files with no error and ancestral_steps.csv had the same number of lines as the edited Rtab file.

I am still having issues when running 'bash kp.gwas_runs.sh'. When I tried to run this command with the edited Rtab, I am still getting the same error messages that I had posted in my initial message. When you tried to reproduce the issue did this occur for you? I can see if I can post the full error message, however it is very lengthy and repetitive.

Thanks for your help!

francesccoll commented 1 year ago

I can attempt to reproduce this issue. Can you also share the file 'paramters.binary.efs.txt' too? Also it may be worth running a single line command in kp.gwas_runs.sh before running them all.