andersen-lab / Freyja

Depth-weighted De-Mixing
BSD 2-Clause "Simplified" License
100 stars 29 forks source link

Requesting help with understanding and usage of `--confirmedonly` flag #200

Closed NickD-BHSc closed 5 months ago

NickD-BHSc commented 5 months ago

Hi there,

I want to confirm whether or not I'm using the --confirmedonly flag properly since my lab would like to exclude unconfirmed lineages from our data analysis.

I've looked through the code in _cli.py and see that the --confirmedonly flag is excluding lineages in the barcode that have 'proposed' and 'misc' in the ID. https://github.com/andersen-lab/Freyja/blob/143ffdc89f478579cfac900ccf4537ae58d60d2b/freyja/_cli.py#L70C5-L78C52

    if barcodes != '-1':
        df_barcodes = pd.read_csv(barcodes, index_col=0)
    else:
        df_barcodes = pd.read_csv(os.path.join(locDir,
                                  'data/usher_barcodes.csv'), index_col=0)
    if confirmedonly:
        confirmed = [dfi for dfi in df_barcodes.index
                     if 'proposed' not in dfi and 'misc' not in dfi]
        df_barcodes = df_barcodes.loc[confirmed, :]

However, when I looked into the usher_barcodes.csv file for examples of unconfirmed lineages, I wasn't able to find any with 'proposed' or 'misc' in its index. Therefore, I'm wondering if I might be misunderstanding how the --confirmedonly flag works, and/or if I've been using the flag in the wrong place, etc.? The command I run is: freyja demix "<bam_input>.vcf.tsv" "<depths_input>.depths" --output "<bam_file>.demixed" --confirmedonly

Thanks for your help!

joshuailevy commented 5 months ago

Hey @NickD-BHSc,

The --confirmedonly flag is actually a bit of a relic, as we made that the default behavior a while back. Didn't realize that was still in the README. The "unconfirmed" lineages that we used to remove during an update run are already pre-removed for the user ( the standard update now just pulls from pre-calculated barcodes). The only time this is relevant now is if the user pre-built their own barcodes that includes unconfirmed lineages.

Long story short, if you're using the barcodes provided by freyja update you should be fine with or without the --confirmedonly flag (it won't do anything). I'll modify the README to reflect this.

Thanks for bringing this up!

Josh

NickD-BHSc commented 5 months ago

@joshuailevy awesome, thanks for clarifying!