Unify the process output python + tcrmatch.cpp files

IEDB / TCRMatch

Other

26 stars 12 forks source link

Unify the process output python + tcrmatch.cpp files #29

Closed acrinklaw closed 2 years ago

acrinklaw commented 3 years ago

Per conversation in #25 , unify the two scripts into a single tool and increase the AIRR format support

dmx2 commented 3 years ago

I'm wondering if the simplest way to do this is to rework the process_output.py script to basically use it as a wrapper which calls the tcrmatch executable, takes the results, and outputs all the metadata with one call. We can benchmark this, but are you aware of performance issues with running the executable on the command line vs. running subprocess.run or os.system within a Python script?

schristley commented 2 years ago

Maybe. I'm not sure if tcrmatch might need to be run specially on some systems because it uses openmp? I'm thinking about how MPI programs often need to be run with mpirun. One idea is to have a simple shell script that runs tcrmatch then runs the python script.

I'm personally fine with the two step process; we get this all the time in our analysis workflows where additional processing needs to be done on output files. One thing you want to prevent is re-running tcrmatch if the user wants 2 or more different output formats.

At the same time, while python handles text processing simply, C++ isn't that much harder...

dmx2 commented 2 years ago

@acrinklaw @schristley Yeah if we could just produce the same output using C++, that would be best.

I'm not as skilled with C++, but we could get it working.

schristley commented 2 years ago

I can take a stab at it too if you can describe the outputs you want.

dmx2 commented 2 years ago

Basically the same as the outputs processed in the Python script, just without the need for it.