biocore / gemelli

Gemelli is a tool box for running Robust Aitchison PCA (RPCA), Joint Robust Aitchison PCA (Joint-RPCA), TEMPoral TEnsor Decomposition (TEMPTED), and Compositional Tensor Factorization (CTF) on sparse compositional omics datasets.
BSD 3-Clause "New" or "Revised" License
76 stars 18 forks source link

qc-rarefy errored on the first sample that is in the unrarefied and not in the rarefied dataset #83

Closed callaband closed 5 months ago

callaband commented 8 months ago

Using qiime2-amplicon-2024.2 with CLI qc-rarefy errored on the first sample that is in the unrarefied and not in the rarefied dataset

Code:

qiime gemelli rpca \
    --i-table ../data/core_diversity_healthy_r11800/rarefied_table.qza \
    --o-biplot ../data/rpca_healthy/healthy_rar_RPCA-ordination.qza \
    --o-distance-matrix ../data/rpca_healthy/healthy_rar_RPCA-dm.qza

qiime gemelli rpca \
    --i-table ../data/fecal_gg2_healthy.qza \
    --o-biplot ../data/rpca_healthy/healthy_non-rar_RPCA-ordination.qza \
    --o-distance-matrix ../data/rpca_healthy/healthy_non-rar_RPCA-dm.qza

qiime gemelli qc-rarefy \
    --i-table ../data/fecal_gg2_healthy.qza \
    --i-rarefied-distance ../data/rpca_healthy/healthy_rar_RPCA-dm.qza \
    --i-unrarefied-distance ../data/rpca_healthy/healthy_non-rar_RPCA-dm.qza \
    --o-visualization ../data/rpca_healthy/healthy_RPCA-rarefy-qc.qzv

fecal_gg2_healthy.qza was the file used to perform core-diversity and created the rarefied table

Error Message:

Plugin error from gemelli:

  The ID '15475.8P3RSQ' is not in the dissimilarity matrix.

Debug info has been saved to /var/folders/3b/vgfxgx4j1hn384yg_whm02qc0000gp/T/qiime2-q2cli-err-qfei4m9o.log

Log file:

Traceback (most recent call last):
  File "/Users/username/miniforge3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/q2cli/commands.py", line 520, in __call__
    results = self._execute_action(
  File "/Users/username/miniforge3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/q2cli/commands.py", line 581, in _execute_action
    results = action(**arguments)
  File "<decorator-gen-74>", line 2, in qc_rarefy
  File "/Users/username/miniforge3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 342, in bound_callable
    outputs = self._callable_executor_(
  File "/Users/username/miniforge3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 615, in _callable_executor_
    ret_val = self._callable(output_dir=temp_dir, **view_args)
  File "/Users/username/miniforge3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/gemelli/q2/_visualizer.py", line 37, in qc_rarefy
    samp_sum_dist) = _qc_rarefaction(table,
  File "/Users/username/miniforge3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/gemelli/utils.py", line 104, in qc_rarefaction
    rarefied_distance = rarefied_distance.filter(unrarefied_distance.ids)
  File "/Users/username/miniforge3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/skbio/stats/distance/_base.py", line 415, in filter
    idxs = [self.index(id_) for id_ in ids]
  File "/Users/username/miniforge3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/skbio/stats/distance/_base.py", line 415, in <listcomp>
    idxs = [self.index(id_) for id_ in ids]
  File "/Users/username/miniforge3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/skbio/stats/distance/_base.py", line 340, in index
    raise MissingIDError(lookup_id)
skbio.stats.distance._base.MissingIDError: The ID '15475.8P3RSQ' is not in the dissimilarity matrix.

I mean, if it is saying that that sample is not in the unrarefied dataset, that is correct (I double checked) - it was rarefied out. Although, so were several other samples. But, that is the first one in the feature table, so no sure if it just stopped checking after it found one. Unfortunately, we have agreements that do not allow me to share the original files publicly. But, can provide privately (slack, email, etc.)

cameronmartino commented 8 months ago

Thanks for reporting @callaband. Are the samples in ../data/fecal_gg2_healthy.qza and ../data/core_diversity_healthy_r11800/rarefied_table.qza the same? They need to be, the easiest way is to filter samples in ../data/fecal_gg2_healthy.qza less than or equal to the sequencing depth used to rarefy ../data/core_diversity_healthy_r11800/rarefied_table.qza before running the second RPCA command on the non-rarefy data.

callaband commented 8 months ago

Ah, okay, so I need to remove any samples that were ultimately rarefied out prior to running that command so that they match up appropriately - did not do that the first time...

Here's what I changed the code to (in case others have this question):

qiime gemelli rpca \
    --i-table ../data/core_diversity_healthy_r11800/rarefied_table.qza \
    --o-biplot ../data/rpca_healthy/healthy_rar_RPCA-ordination.qza \
    --o-distance-matrix ../data/rpca_healthy/healthy_rar_RPCA-dm.qza

qiime feature-table filter-samples \
    --i-table ../data/fecal_gg2_healthy.qza \
    --p-min-frequency 11800 \
    --o-filtered-table ../data/fecal_gg2_healthy_filt.qza

qiime gemelli rpca \
    --i-table ../data/fecal_gg2_healthy_filt.qza \
    --o-biplot ../data/rpca_healthy/healthy_non-rar_RPCA-ordination.qza \
    --o-distance-matrix ../data/rpca_healthy/healthy_non-rar_RPCA-dm.qza

qiime gemelli qc-rarefy \
    --i-table ../data/fecal_gg2_healthy_filt.qza \
    --i-rarefied-distance ../data/rpca_healthy/healthy_rar_RPCA-dm.qza \
    --i-unrarefied-distance ../data/rpca_healthy/healthy_non-rar_RPCA-dm.qza \
    --o-visualization ../data/rpca_healthy/healthy_RPCA-rarefy-qc.qzv

Okay, now the error [Yay for new error?] is:

Plugin error from gemelli:

  [Errno 2] No such file or directory: '/Users/username/miniforge3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/gemelli/q2/qc_assests/index.html'

Debug info has been saved to /var/folders/3b/vgfxgx4j1hn384yg_whm02qc0000gp/T/qiime2-q2cli-err-x4orzkdj.log

Log file:

Traceback (most recent call last):
  File "/Users/username/miniforge3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/q2cli/commands.py", line 520, in __call__
    results = self._execute_action(
  File "/Users/username/miniforge3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/q2cli/commands.py", line 581, in _execute_action
    results = action(**arguments)
  File "<decorator-gen-74>", line 2, in qc_rarefy
  File "/Users/username/miniforge3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 342, in bound_callable
    outputs = self._callable_executor_(
  File "/Users/username/miniforge3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 615, in _callable_executor_
    ret_val = self._callable(output_dir=temp_dir, **view_args)
  File "/Users/username/miniforge3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/gemelli/q2/_visualizer.py", line 83, in qc_rarefy
    q2templates.render(index, output_dir, context=context)
  File "/Users/username/miniforge3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/q2templates/_templates.py", line 44, in render
    with open(source_file, 'r') as fh:
FileNotFoundError: [Errno 2] No such file or directory: '/Users/username/miniforge3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/gemelli/q2/qc_assests/index.html'

I tried re-installing gemelli, but it says everything is installed already.

I looked and /Users/username/miniforge3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/gemelli/q2 directory exists, but it is correct - there is no qc_assets folder present (only directory is called 'tests')

cameronmartino commented 8 months ago

This seems like a packaging issue, I will update the pypi soon with this packaged, which should solve the problem! Will let you know when that is done!

ARW-UBT commented 6 months ago

Hello @cameronmartino , I got the same error (see below) using qiime gemelli qc-rarefy with the Qiime2 RPCA CLI tutorial data. Has the mentioned packaging issue been solved and should I try to re-install gemelli?

FileNotFoundError: [Errno 2] No such file or directory: '/home/bt140047/miniconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/gemelli/q2/qc_assests/index.html' Best,

cameronmartino commented 5 months ago

Hi @callaband and @ARW-UBT ,

This is now fixed in the latest version (v. 0.0.11) let me know if you run into any further issues. Thank you for reporting and using Gemelli!!

Cheers,

Cameron