HaddingtonDynamics / Dexter

GNU General Public License v3.0
367 stars 85 forks source link

Return directory / folder listing when read_from_robot request has #43

Closed JamesNewton closed 5 years ago

JamesNewton commented 5 years ago

In order to support a better access to the file system on the robot via the socket interface, and avoid the need for a Samba or SFTP or other file transfer system, being able to read the directory listings from the robot is important.

We need to change the read_from_robot code in DexRun to look for a trailing / or \ in the request and if found, do an opendir / readdir instead of an fopen / fread. https://stackoverflow.com/questions/4204666/how-to-list-files-in-a-directory-in-a-c-program

Our current setup will need some adjustment to comply https://github.com/HaddingtonDynamics/Dexter/blob/master/Firmware/DexRun.c#L2658

Since the blocks are sent in multiple socket buffers, one after the other, and returning a block that isn't full is the signal to DDE that the transfer is complete, we have to do several things:

  1. Hold a flag that the current read is of a directory, not a file. We only get the requested data on the first request, so we don't have it when the second one comes in. Not too hard, we can just have a wdp variable as type DIR and check if wfp or wdp is non-zero for the subsequent blocks.

  2. Return full blocks until all the files are listed. Rather than reading all the file names in to a RAM buffer on the first request, we can pad the returned file name out to the buffer size (MAX_CONTENT_CHARS) and return each file as each block is requested. e.g. block 1 is the first file name, padded to MAX_CONTENT_CHARS, then when block 2 is requested, we send the second file name in the folder, and so on.

  3. When we have returned all the files / folder names in the directory, we need to signal the end. The C code won't really know that it's sent the last name (it returns null when you try to read the next name and it's at the end) so we will end up returning an empty block when DDE requests the next file after the last file was returned. That will stop DDE and adding a null block to the return string shouldn't cause a problem. https://github.com/cfry/dde/blob/300634dbb73f4932e1f6352e83efb3894a913e5b/instruction.js#L3289

  4. DDE will have this string with a bunch of spaces after each name so it will need to trim them for display / selection.

We could skip this and implement #20 instead, then use a script to return the directory listing.

cfry commented 5 years ago

Its worth it to do some brainstorming here about the things we might want to do in the future.

I like the idea of sending over a bash cmd and getting a result especially if the bash cmd operates with a consistent environment that maintains its current working directory so that the 2nd "cd" can take advantage of the first.

I'd also like to send over random JS, eval that on Dexter and get back the result.

James N has his chat server stuff so how does that all fit in?

Consider adopting the URL protocol syntax (and maybe some of the semantics): ie: file://foo/bar.txt gets file content file://foo/ gets directory listing js://(2 + 3) returns 5 js://gripper_width() returns a number in meters of the gripper width With "js" protocol we just write a js method to implement whatever is after the :// . I'm inclined to use that heavily and not invent another language. We could have js://bash("pwd") for instance, but admittedly bash://pwd has a certain appeal.

There is an existing "irc" protocol for chat, ie irc://...... Seems to me read_from_robot's "source" arg can be in the same "language" as used in the chat server.

The syntax of including arguments to a cmd in URLs of foo?arg1=111&arg2=222 is Tim or his misguided minions inventing yet another syntax for arg passing. They should have just used JS syntax, but they didn't even know enough to have the conversation. (Maybe JS wasn't fully defined at the time, but HTTP and HTMP are still language disasters.)

I recommend returned values be in JSON, so our directory server could return an array of strings for instance.

I like the idea of keeping the basic strategy we use in read_from_robot for collecting large values into one. James N suggested sending over each file name in a directory query as one item. But what if we have a really long file name that can't fit in one? Or a bunch of short files names that can fit in many per packet? I say, just consider the result to be some sort of "virtual file" and send the whole thing over as if it were a file.

On the dexrun side, seems to me the easiest thing to do is make a "virtual file" of a result, then treat it just as you treat files now.

On the DDE end, I have the original "query" at the time of receiving the packets, including the last one, so I can use that original query to format the result, ie if its js://foo I can assume the resultant virtual file is the source code for a JSON object, and call the JSON parser on it and return a nice structured value.

Since its easier to format than to parse, standardizing on sending JSON rather than just sending random text and having its parsing be idiosyncratic to the query will be less code.

To be continued ...

This message isn't meant to convey a spec, just a conversation opener.

JamesNewton commented 5 years ago

Its worth it to do some brainstorming here about the things we might want to do in the future. I like the idea of sending over a bash cmd and getting a result especially if the bash cmd operates with a consistent environment that maintains its current working directory so that the 2nd "cd" can take advantage of the first.

It can also be sent a parameter of the folder we want the directory from.

I'd also like to send over random JS, eval that on Dexter and get back the result.

Yeah, the bash command line can also fire up node.js, or send a message to an existing node.js running in yet another thread.

James N has his chat server stuff so how does that all fit in?

That's pretty much a separate deal, which still talks through DexRun. Runs in another thread. Basically you can think of it as an adapter cable. Pretty much all it does.

Consider adopting the URL protocol syntax (and maybe some of the semantics): ie: file://foo/bar.txt gets file content file://foo/ gets directory listing js://(2 + 3) returns 5 js://gripper_width() returns a number in meters of the gripper width With "js" protocol we just write a js method to implement whatever is after the :// . I'm inclined to use that heavily and not invent another language. We could have js://bash("pwd") for instance, but admittedly bash://pwd has a certain appeal. There is an existing "irc" protocol for chat, ie irc://...... Seems to me read_from_robot's "source" arg can be in the same "language" as used in the chat server. The syntax of including arguments to a cmd in URLs of foo?arg1=111&arg2=222 is Tim or his misguided minions inventing yet another syntax for arg passing. They should have just used JS syntax, but they didn't even know enough to have the conversation. (Maybe JS wasn't fully defined at the time, but HTTP and HTMP are still language disasters.)

For the read_from_robot, at least for now, we want to keep the request in the old standard format because that's what our DexRun.c and the C compiler will understand. I'd rather not have to write a converter... Although, I guess it's just stripping off the "file:/" and keeping the "/dir/file"? I guess that isn't so hard. Just look for it and strip it if it's there. Then in the future, look for the "js:/" and "bash:/"... ok, not as bad an idea as I thought it was in the first second. But it takes another bit of work... have to remember to do it.

I recommend returned values be in JSON, so our directory server could return an array of strings for instance.

Well, we can easily wrap each returned value in "[" and "]," and the response to the first one can be "[" and the last one "]" but that leaves a trailing comma on the last entry... Is that not valid JSON?

hang on, if we return the first file name in the first response, we can pre-load a '[' in a temp variable, then return "[filename/ padding]" (no comma) then pre-load temp with "," on every following request, except the last one where we replace the "," with a "]" so you get something like [[first ] ,[second ] ... ] which I think is valid right? e.g. the commas don't have to be on the end. In case it isn't clear, we don't send the trailing comma because we have no way of knowing this is the last file until we get a null back from the OS when asking for the next file name in the directory.

I like the idea of keeping the basic strategy we use in read_from_robot for collecting large values into one. James N suggested sending over each file name in a directory query as one item. But what if we have a really long file name that can't fit in one?

Then we send it in two, or three, or whatever. Just like a file. The only trick is we have to pad the last part of the file name so it doesn't stop the transfer. so we might send [[first ] ,[secondfilenamewhichisstupidlylo ngandshouldneverbeusedbyanyidiotw hoisnameingafile ] ]

Or a bunch of short files names that can fit in many per packet?

Yes, it is less efficient, but this is about what's easy to do in the little C program on the little robot with limited memory. You big boys can deal with it on the 64 bit computer with gigs of ram and a language that is expected to garbage collect.

I say, just consider the result to be some sort of "virtual file" and send the whole thing over as if it were a file. On the dexrun side, seems to me the easiest thing to do is make a "virtual file" of a result, then treat it just as you treat files now.

I thought about that, but when you have a very large directory, you have to allocate and then free that much ram. Malloc and Free are good to avoid in C programming. And in C++. And every other language when you are trying to do real time programming. Garbage collecting the heap introduces all sort of nasty timing issues. We really really don't want to do that if we can help it.

On the DDE end, I have the original "query" at the time of receiving the packets, including the last one, so I can use that original query to format the result, ie if its js://foo I can assume the resultant virtual file is the source code for a JSON object, and call the JSON parser on it and return a nice structured value. Since its easier to format than to parse, standardizing on sending JSON rather than just sending random text and having its parsing be idiosyncratic to the query will be less code.

Yep, I can see the advantage of that. And I think the JSON format is easy enough to fake in C.

cfry commented 5 years ago

foo = [3, 4,] is valid in new JS, ie the trailing comma is ok. Cleaner is just don't use a comma on the first one, and all the others, preceed with a comma.

Don't wrap each file name in square brackets, wrap them in double quotes. We want a string.

On the malloc and ram requirements, how about if there's just a special temp file and you just write/pipe the JSON to that, then when all done, send over the whole file however you send over files now. When you START the read_from_robot just clear the content of the file. Presuming we only have one such request at a time, this shouldn't be tough.

If all the formatting work on the DexRun side is hard, just pipe the output of "ls" to a file, then send the file. Same for every other bash cmd. I can parse it on DDE's side, its not that hard. Although generally its easier to format on output than parse on input, we have a special case here where harder to work in C on dex, so that might be worth breaking my "rule" for :-) .

"ls" has a ton of options. we might want to use some of them, ie file size & write date being the most generally useful. Let's not do recursive decent, that's for the user of DDE to choose.

More important than these details is the larger "protocol" stuff, evaling bash and JS and whenever else we decide on. I'm expecting there to be a node running on Dexter. Is that reasonable? A special cmd might "boot' it.

JamesNewton commented 5 years ago
  1. Use 'r' command with block 0 and a directory name which must end in a '/' to indicate that it's a directory read vs a file read. On block 0, call opendir with the path specified by the oplet, and then call readdir to get a dirent structure with the next entries name, then call stat with the concatenated path and entry name to get the type, size, dates, etc... , then unpack that as JSON into a buffer string of sufficient size to avoid overrun. This size would be 61 + the maximum size of a JSON entry for a directory entry. e.g. the max entry name size, file size, dates, etc... Keep a pointer to the next free bytes in the buffer. Pre-load that buffer with an opening '{'. Return the first 62 bytes of the buffer in the expected 'r' format. If there is less than 62 bytes in the buffer, readdir another dirent and stat and unpack again. After copying that first 62 bytes into the reply, shift the buffer contents down 62 bytes so the next char to send is at the start of the buffer and point to the next free byte. The buffer, it's pointer, the dirent, and the dir handle must all be static across multiple calls.

  2. Use 'r' oplet with non-zero block number (can increment, doesn't matter) to trigger reading of the next 62 bytes from the buffer. Anytime the buffer drops below 62 bytes readdir, stat, and unpack another entries data.

  3. When readdir returns an error, we know that we've read all the data, so closedir and that becomes a flag to not readdir again (the dir handle being null). Append a closing '}' to the buffer, then just send back whatever data is left in the buffer. Until it is empty.

It should be obvious that doing a directory read in C is NOT a simple thing. Compare that to node.js which can do it in 22 lines of code with error checking. https://code-maven.com/list-content-of-directory-with-nodejs

cfry commented 5 years ago

Gad, 22 lines of code for a measly directory listing? That JS library must have been written by a C programmer :-( Have you actually implemented this yet? I'd be interested to see an actual example the returned dir listing.

Here's a wild idea if you haven't already implemented it: Fire up node.js and just use that node code. Write it to a file, then "redirect" that read_from_robot("foo/bar/") to read_from_robot("dir_listing_temp.json")

On Wed, Jul 10, 2019 at 9:07 PM JamesNewton notifications@github.com wrote:

1.

Use 'r' command with block 0 and a directory name which must end in a '/' to indicate that it's a directory read vs a file read. On block 0, call opendir with the path specified by the oplet, and then call readdir to get a dirent structure with the next entries name, then call stat with the concatenated path and entry name to get the type, size, dates, etc... , then unpack that as JSON into a buffer string of sufficient size to avoid overrun. This size would be 61 + the maximum size of a JSON entry for a directory entry. e.g. the max entry name size, file size, dates, etc... Keep a pointer to the next free bytes in the buffer. Pre-load that buffer with an opening '{'. Return the first 62 bytes of the buffer in the expected 'r' format. If there is less than 62 bytes in the buffer, readdir another dirent and stat and unpack again. After copying that first 62 bytes into the reply, shift the buffer contents down 62 bytes so the next char to send is at the start of the buffer and point to the next free byte. The buffer, it's pointer, the dirent, and the dir handle must all be static across multiple calls. 2.

Use 'r' oplet with non-zero block number (can increment, doesn't matter) to trigger reading of the next 62 bytes from the buffer. Anytime the buffer drops below 62 bytes readdir, stat, and unpack another entries data. 3.

When readdir returns an error, we know that we've read all the data, so closedir and that becomes a flag to not readdir again (the dir handle being null). Append a closing '}' to the buffer, then just send back whatever data is left in the buffer. Until it is empty.

It should be obvious that doing a directory read in C is NOT a simple thing. Compare that to node.js which can do it in 22 lines of code with error checking. https://code-maven.com/list-content-of-directory-with-nodejs

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/HaddingtonDynamics/Dexter/issues/43?email_source=notifications&email_token=AAJBG7PGSE6VZUNAY7J5VK3P62BVTA5CNFSM4GFWXOSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZVFSJI#issuecomment-510286117, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJBG7OA2AMWUD5UEMLP33DP62BVTANCNFSM4GFWXOSA .

JamesNewton commented 5 years ago

Or we could just do this via #20

JamesNewton commented 5 years ago

Use #20

JamesNewton commented 3 years ago

Or better yet, do it via the node server web editor support. See wiki for node js server.

JamesNewton commented 3 years ago

Kamino cloned this issue to HaddingtonDynamics/OCADO