Open alchemistcai opened 2 months ago
Hi @alchemistcai,
Thanks for testing the code and pointing out inconsistencies. They resulted from handling different fasta ID conventions when I developed the pipeline. I will refactor the code to use a single function to parse the fasta file to be consistent.
In
get_esm_embedding.py>process_fasta
,get_esm_if_embedding.py>embedding
anddata_process.py>prep_test_dataset
,utils.py>process_fasta_file
functions,fasta ids are parsed like:I use
get_esm*_embedding.py
to generate embedding (see.npy) from a fasta file like:When I use
inference.py
,the id is parsed as|sea
and the script fails.I adjustsdata_process.py
to make it work.I suggest:
key
to pass a Callable object to let others decide how to parserec.id
,like python'slist.sort(key=None)
.