RNAcentral / rnacentral-import-pipeline

RNAcentral data import pipeline
Apache License 2.0
2 stars 1 forks source link

Improve how new IDs are submitted to LitScan #194

Closed carlosribas closed 8 months ago

carlosribas commented 8 months ago

The way I'm submitting the new IDs to LitScan is not very efficient. At the very least, the way I am creating the file with new IDs needs to be reviewed. The python script below had much better performance than the bash script currently used

def read_ids(file_path):
    with open(file_path, "r") as file:
        return set(line.strip() for line in file)

def filter_ids(all_ids, new_ids):
    return [id for id in all_ids if id.lower() in new_ids]

def write_results(filtered_ids, output_file):
    with open(output_file, "w") as file:
        for id in filtered_ids:
            file.write(id + "\n")

def main():
    all_ids = read_ids("all_ids.txt")
    new_ids = read_ids("new_ids.txt")
    filtered_ids = filter_ids(all_ids, new_ids)
    write_results(filtered_ids, "results.txt")

if __name__ == "__main__":
    main()
carlosribas commented 8 months ago

This issue has been moved to Linear