gnames / gndiff

GNdiff compares scientific names from two files
MIT License
5 stars 1 forks source link

file disk read vs other input options #28

Open abubelinha opened 4 months ago

abubelinha commented 4 months ago

Hi @dimus This is more a question than an issue.

My initial idea of gndiff was comparing two files so its .csv input design is perfect for that:

That just works.

Now I am wondering about some other possible use cases with frequent repetitive gndiff calls (also from Python). My concern is whether so many disk-write/disk-read of .csv files could/should be avoided.

Figure out my script is parsing a long list of new specimens to include in a museum collection. I might prefer to gndiff-match them one by one, for whatever reason (my script could need to make other intermediate tasks in a certain order before processing next specimen name). So I would be passing gndiff a small source.csv with just one row, but so many times.

In such scenario, would it make sense not creating a source.csv in disk (which means a Python file-write, plus a gndiff file-read), but somehow passing the source info as a parameter instead? Maybe this is already possible somehow although I am not sure about what syntax should I try. Or maybe this doesn't make sense at all because the script performance would be similar (i.e. the intermediate tasks are slower than gndiff call).

Of course, I can always design my script to process all gndiff-matching operations in advance. I am just thinking before scripting and I am not a professional, so don't take me too serious.


Somehow related to this, in #13 I suggested the possibility of using gndiff as a server (so we can run gndiff in one machine and call it from others). If that feature ever becomes possible, I wonder how such a server would work.

Just wondering