misialq commented 1 year ago

This PR fixes the issue described on the Q2 forum where the metadata fetched for the requested run IDs contains some erroneous entries resulting from addition of SRA samples without associated runs (when we request metadata from NCBI we receive metadata for samples that contained our run but also other samples that were part of the same experiment package). Those entries are missing most of the required information and can (and should) be dropped from the final results' DataFrame.

codecov[bot] commented 1 year ago

Codecov Report

Merging #147 (fc7f4fd) into main (3ede75c) will increase coverage by 0.04%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #147      +/-   ##
==========================================
+ Coverage   98.57%   98.61%   +0.04%     
==========================================
  Files          29       29              
  Lines        2943     2959      +16     
==========================================
+ Hits         2901     2918      +17     
+ Misses         42       41       -1

Impacted Files	Coverage Δ
q2_fondue/entrezpy_clients/_efetch.py	`96.06% <100.00%> (+0.45%)`	:arrow_up:
q2_fondue/tests/test_efetch.py	`99.37% <100.00%> (+<0.01%)`	:arrow_up:
q2_fondue/tests/test_metadata.py	`99.69% <100.00%> (+0.01%)`	:arrow_up:

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

misialq commented 1 year ago

Hey @adamovanja, I added the test you mentioned. I'm just thinking, do you think you could test this with some unrelated (small) set of IDs? Preferably, sample IDs or some type other than runs, just to make sure all is ok (I already tried with project IDs). Thanks!

bokulich-lab / q2-fondue

FIX: metadata should only retain the runs that were requested #147

Codecov Report