gnames / gnparser

GNparser normalises scientific names and extracts their semantic elements.
MIT License
39 stars 4 forks source link

Running inside a Apache/PHP script #144

Closed barotto closed 3 years ago

barotto commented 3 years ago

Hi,

I'm trying to use the latest version (1.0.7) inside a PHP script with proc_open(). Unfortunately gnparser tries to create a configuration file inside the user's $HOME, which for an Apache server is not set, so I get this error:

2021/02/15 19:13:43 Cannot find home config directory: neither $XDG_CONFIG_HOME nor $HOME are defined.

If I run the command with something like this (PATHTOBIN is the path to the gnparser binary):

export HOME=PATHTOBIN && PATHTOBIN/gnparser ...

I get this instead:

2021/02/15 19:06:40 Creating config file: PATHTOBIN/.config/gnparser.yaml.
2021/02/15 19:06:40 Cannot create dir PATHTOBIN/.config/gnparser.yaml: mkdir PATHTOBIN/.config: permission denied.

I don't want to give writing permissions to any directory to the Apache process, so is there a way to run gnparser without a configuration file like it was with older versions? Or is there a command line argument to pass a config file name?

barotto commented 3 years ago

Ok, I fixed my own problem running gnparser as a normal user once. It generated a gnparser.yaml file that I then put inside the PATHTOBIN/.config/ directory. After setting HOME=PATHTOBIN in the PHP command line, gnparser can now run normally. Thanks.

dimus commented 3 years ago

Hmm, maybe having config was not a good idea. I am on a fence about its usability,

Someone else had a problem with it in a different scenario, so I will remove it.

dimus commented 3 years ago

v1.0.9 is out, it does not have config file anymore.

@barotto, may I ask you for a simple PHP code snippet that uses pipes to access gnparser? I would like to add it to README as an example.

barotto commented 3 years ago

Here's what I use on my program:

function exec_cmd($cmd, &$stdout=null, &$stderr=null)
{
    $proc = proc_open($cmd, [
        1 => ['pipe','w'],
        2 => ['pipe','w'],
    ], $pipes);
    $stdout = stream_get_contents($pipes[1]);
    fclose($pipes[1]);
    $stderr = stream_get_contents($pipes[2]);
    fclose($pipes[2]);
    return proc_close($proc);
}

// BEWARE: $scientific_name must be properly sanitized and escaped for command line execution!
$cmd = "gnparser -f compact -d $scientific_name";
if(exec_cmd($cmd, $stdout, $stderr) < 0) {
    throw new Exception("program error");
}
$parsing_result = json_decode($stdout, true);

Although this is not a very efficient method for bulk parsing, it's fine for situations where there's just a couple of names to analyze like for showing formatted text on a web page.

For thousands of names it's veeery slow and in fact I'll experiment with the RESTful API for situations like mass import of species into a database.

dimus commented 3 years ago

Thank you @barotto. Yes, starting a new process for every name is time-consuming, but if all you need is occasional parsing, it will work well. What I was hoping for is a snippet, where the parser process runs all the time, and just takes name after name from STDIN, sending results to STDOUT.

barotto commented 3 years ago

I experimented a bit with the concept and ended up creating a singleton class that keeps the gnparser process open.

class_ScientificNameParser.txt

It's much faster and actually usable with big tables and thousands of names, so I'm satisfied with the results.

Definitely not a snippet though.

dimus commented 3 years ago

I experimented a bit with the concept and ended up creating a singleton class that keeps the gnparser process open.

class_ScientificNameParser.txt

It's much faster and actually usable with big tables and thousands of names, so I'm satisfied with the results.

Definitely not a snippet though.

Very cool, thank you for your time, if you feel it is working well, and would like to create a Gist on GitHub with it, I will add a link to your code in the README file.

barotto commented 3 years ago

It works well enough for me, so here it is (on my work-related account): https://gist.github.com/marcobrt/72b2a3d1b0649c1bf738c9fc88f74ec0

dimus commented 3 years ago

Thank you @barotto, I added a link to your gist to README