accre / lstore

LStore - A fault-tolerant, performant distributed data storage framework.
http://www.lstore.org
Apache License 2.0
4 stars 5 forks source link

Clean slate of gridftp plugin #123

Closed PerilousApricot closed 7 years ago

PerilousApricot commented 8 years ago

Cut a PR of my WIP branch of gridftp plugin.

PerilousApricot commented 8 years ago

@tacketar When implementing the directory listing functionality, it looks like the existing implementation with the old API uses some functions we didn't truly want exported. Any ideas on a smarter way to do this bit https://github.com/accre/lstore/pull/123/commits/c618c66e085bc6c898ee446a4e1fc77e2a081d3f#diff-c0c66d2069264325d31e8a6538146f9cR71 that lines up with this old code https://github.com/accre/lstore-gridftp/blob/e48fd883baaab34a5777446bcc797f3fd6f72d4c/src/gridftp_lfs_stat.c#L342 ?

tacketar commented 8 years ago

Specifically what bits are you concerned about? The line # 238 is part of the API and should be. All the LIO object iterators should be part of the API.

PerilousApricot commented 8 years ago

_lio_parse_stat_vals was prefixed with an underscore, which I assumed implied it was supposed to be hidden.

tacketar commented 8 years ago

I also see a bunch of more names that need mangling:(

PerilousApricot commented 8 years ago

Hopefully going through and documenting things will fish out the rest of those little mistakes.

It's also kinda clunky to need to make a regex/glob to list the files in a directory. (what if the the filename already has regex/glob special characters). Is there a more straightforward way to grab an iterator?

tacketar commented 8 years ago

Yeah but it's not that big to publish either. It's just a parsing function that has a specific use case, namely for lio_fuse, that's being reused. It doesn't do anything fancy with an internal structure. Just copies the LIO attributes into a system "stat" structure.

PerilousApricot commented 8 years ago

I don't doubt that it is, I just interpreted the leading underscore as "do not use".

tacketar commented 8 years ago

There are helper functions to do that. It's pretty easy if I remember correctly. Got to run to a meeting but I can look at it later.

tacketar commented 8 years ago

The leading underscore has 2 meaning: 1) for internal use in the file (like a static) and 2) it's may not be threadsafe if it touches a global. IF thread safety is an issue it's mentioned in the fn description.

tacketar commented 8 years ago

Look at https://github.com/accre/lstore/blob/master/src/lio/bin/lio_ls.c#L201-L204 for an example.

lio_path_resolve() will take a scp type path (user@host:/path) and turn it into a tuple containing with the path and the user creds. It also will translate FUSE mount paths and reparse them for L-Store and set "is_lio" in the tuple to determine if it's an L-Store path or a local path. For gridftp this is probably overkill but will work just the same. lio_path_wildcard_auto_append() will take the tuple and look at the path and append a wildcard if the path ends in a "/". It assumes you have a glob which is the normal way paths are specified from the command line lio_os_path_glob2regex() - Converts the glob to a regex for passing to the different object iterators.

PerilousApricot commented 8 years ago

I dunno. I get the feeling from trying to work with anew with the API that either the API is too broad or I can't keep a sufficient number of concepts in my head at the same time. Or, almost certainly, I broke it so things don't make concise sense. It might also just be a matter of some nice documentation being able to help things click.

Like I said, the fault is almost certainly my own, but even as a person somewhat familiar (?) with the codebase, I'm struggling to grok anything past the toolbox layer :(

tacketar commented 8 years ago

No I don't think it's you. The toolbox is pretty much Data Structures

  1. Most users will only need to deal with the GOP at the high level to just manage tasks. If they write their own tasks it would most likely use the threadpool task framework. The socket based and MQ (MQ base, streams, and ongoing) tasks they won't need when doing pure LIO stuff. Only if you write extensions do you need that stuff. I could pretty easily write some high-level docs providing an overview of the base GOP and the different types of tasks supported(thread, socket, MQ)
    and then write an example program using base GOP tools (gop_op_t, opque_t, gop_waitany(), gop_waitall(), etc) and threadpool tasks.
    Would that help?

Likewise in the LIO I could add some docs for the core objects (OS, RS, DS, segment) describing their respective "class" APIs. This gives a feel for what's going on behind the scenes. Although most people just using lio will mainly use the higher level lio_* API. Most of the these calls translate to POSIX type calls (open,close, read, write, create, move, truncate) and not confusing. If you understand the GOP framework then using them becomes easy.

The object and attribute iterators don't directly map to POSIX calls and could definitely use some more explaining along with a few documented examples.

I'd like to get a core set of things documented to make it easier for others so I can move on to working on getting the IBP server refactored and add LevelDB. Are there more core things you think of that would be useful?