borenstein-lab / fishtaco

FishTaco (Functional Shifts Taxonomic Contributors) is a metagenomic computational framework that aims to identify the driver taxa of microbiome functional shifts
Other
23 stars 4 forks source link

Error: Computing a differential abundance score for each taxa #8

Closed amyellison closed 4 years ago

amyellison commented 4 years ago

I have just got FishTaco running on my linux system and it is working correctly with your example data. However I am running into an error with my data. The output:

Given parameters: {'taxa_abun_file': 'fishtaco_taxabund_12_nozero.csv', 'function_abun_file': 'fishtaco_funabund.tsv', 'class_file': 'fishtaco_samples_12.tsv', 'taxa_to_function_file': None, 'apply_inference': True, 'case_label': '1', 'control_label': '0', 'output_pref': 'fishtaco_out', 'map_function_level': 'pathway', 'map_function_file': None, 'perform_inference_on_ko_level': False, 'multiple_hypothesis_correction': 'FDR-0.05', 'max_da_functions_cases_controls': None, 'taxa_assessment_method': 'multi_taxa', 'score_to_compute': 'wilcoxon', 'max_score_cutoff': '100', 'na_rep': 'NA', 'number_of_permutations': '100', 'number_of_shapley_orderings_per_taxa': '5', 'da_result_file': None, 'single_function_filter': None, 'multi_function_filter_list': None, 'functional_profile_already_corrected_with_musicc': False, 'write_log': False, 'residual_mode': 'remove_residual', 'normalization_mode': 'scale_permuted', 'permutation_mode': 'blocks'} Reading input files... Running MUSiCC... Loading data using pandas module... Done. Performing MUSiCC Correction... Learning sample-specific models ..........................................................................................................................................................................................................................................................................Done. Performing MUSiCC Normalization... Done. Done. Running time was 20 seconds. No input of genomic content given to FishTaco, inferring the mapping of taxa to functions from taxonomic and functional profiles Mapping functions to pathway/module level... Reading files... Done. Writing output... Done. Done. Done. Reducing taxa, function, and class data to contain the exact same set of samples... Done.

controls = 66, #cases = 70

Computing a differential abundance score for each taxa... Traceback (most recent call last): File "run_fishtaco.py", line 177, in main(vars(given_args)) File "/home/amy/microbiome/picrust2_standalone/fishtaco/envFT/lib/python3.6/site-packages/fishtaco/compute_contribution_to_DA.py", line 468, in main 'alpha': args['alpha']} KeyError: 'alpha'

I have checked my input file formats and they are identical to examples. I have tried to remove any taxa lines that have zeros in all samples (I had some as a subset of a larger dataset).

A little of my tab-delimited taxa abundance file:

Taxa | AE010719-479 | AE010719-480 | AE010719-481 43fddf1528d4a98928fd8c3a8ac23bfd | 0.411748379294621 | 0.07942961346447 | 0.5158757202156 0eb88b722902316b11770922bcdaca7b | 0.237564679526576 | 0.043107829362351 | 0.211278731181463 efbe1f58b1e2984ddc53a64f047d94ff | 0.124541543585575 | 0.015993542793937 | 0.077922681370423 496ecde24f9ab698992413d3d4f04b5f | 0.037925497115442 | 0.025350512690204 | 0.022458335914751 637b9b3f4d1cbb1a10c07817619cdf69 | 0.042802482107809 | 0.002570924636035 | 0.040874171364847 d46e2205f0c6ecf67b51f83d111c509c | 0.001645486806368 | 0.022121909658904 | 0.001254569109721 99e433a3ce4d5290445f668df2c9147e | 0.031403025316707 | 0.007294251292936 | 0.023108853230903 487de539f50da640cd8914cea7821561 | 0.020519022223985 | 0.002122507548354 | 0.017424570968342

A little of my tab-delimited function abundance file:

Function | AE010719-479 | AE010719-480 | AE010719-481 K00001 | 5.51279E-05 | 0.0002265119 | 4.20007E-05 K00002 | 0 | 8.654E-07 | 0 K00003 | 0.0003856851 | 0.0005163871 | 0.0003739182 K00004 | 3.7417E-06 | 2.20775E-05 | 4.9638E-06 K00005 | 2.286E-07 | 0.000106112 | 1.758E-07

A little of my tab-delimited sample information file:

Sample | Site AE010719-479 | 1 AE010719-480 | 1 AE010719-481 | 1 AE010719-482 | 1 AE010719-483 | 1 AE010719-484 | 0 AE010719-485 | 0 AE010719-486 | 0

Could you please advise what this error relates to? The only thing I can think of is that there are no significant abundances of taxa. I have tried with no FDR correction and get the same. I would be surprised if this is the case as other programs using same data find many sig differences.

Many thanks in advance for you help!

engal commented 4 years ago

Hmm, this error is a little odd, but hopefully I can help!

Based on the error message, it looks like your run_fishtaco.py script might be out of date? A couple of updates ago we added an argument to specify a custom alpha cutoff for multiple hypothesis testing, but it looks like your run_fishtaco.py is not recording that argument. However, your compute_contribution_to_DA.py file seems to be more recent, and is looking for the alpha argument.

Is there any chance you may have obtained files from different releases of FishTaco?

amyellison commented 4 years ago

Ah yes - thank you!

Whilst trying to get install to work on my system, I must have downloaded older version of main script from the manual link to github. A fresh install of latest version using pip and downloading main script from here worked.

Thanks for your help.

engal commented 4 years ago

Glad I could help. One question though, when you installed via pip, did you not have access to the run_fishtaco.py script? When you install via pip, I think the run_fishtaco.py and test_fishtaco.py scripts should be added to your bin directory. If you're doing a local install (e.g. a user-specific installation or in a virtual environment) then they should be added to the bin directory associated with that local installation environment.

amyellison commented 4 years ago

Ah yes sorry they did go to bin - I was looking in lib not bin!

engal commented 4 years ago

Ok, thanks for the information! I'll close this issue out.