SegataLab / panphlan

PanPhlAn is a strain-level metagenomic profiling tool for identifying the gene composition of individual strains in metagenomic samples
MIT License
43 stars 6 forks source link

panphlan_download_pangenome.py: panphlan_pangenomes_links.tsv written to cwd #25

Closed nick-youngblut closed 3 years ago

nick-youngblut commented 3 years ago

panphlan_download_pangenome.py downloads the panphlan_pangenomes_links.tsv file to the current working directory:

def find_url(query, verbose):
    if verbose:
        print("Retrieving mapping file...")
    mapping_file = os.path.basename(DOWNLOAD_URL).replace('?dl=1', '')
    download(DOWNLOAD_URL, mapping_file, overwrite=True, verbose=False)
    IN = open(mapping_file, mode='r')

If 2 panphlan_download_pangenome.py jobs are running in the same directory (eg., if someone is downloading many pangenomes at the same time), then the panphlan_pangenomes_links.tsv file can be over-written while jobs are trying to read the file via IN = open(mapping_file, mode='r'), which will cause those jobs to die.

As a minor note, the user would likely want to have the panphlan_pangenomes_links.tsv file in the output directory instead of in the current working directory.