Open 0div opened 1 month ago
Latest commit: 86262f1bc62183b0ac969290463b1e5cc06587e6
Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.
Click here to learn what changesets are, and how to add one.
Click here if you're a maintainer who wants to add a changeset to this PR
I pushed an edit that should sketch a type cleanup.
There are two unfinished parts:
It should be possible to typeguard this so there is no type assumption when dealing with the parameters.
const { path, writeOpts, writeFiles } = typeof pathOrFiles === 'string'
? { path: pathOrFiles, writeFiles: [{ data: dataOrOpts }], writeOpts: opts }
: { path: undefined, writeFiles: pathOrFiles, writeOpts: dataOrOpts }
const blobs = await Promise.all(writeFiles.map(f => new Response(f.data).blob()))
2. The problem here was actually there even in the original SDK code — the `EntryInfo` type assumption works because the returned types are exactly of the shape of `EntryInfo`, but I think we should property type these by creating a mapping. Also, this will not correctly return the array if I just passed an array with one file.
```ts
return files.length === 1 ? files[0] : files
Some edits I did:
filename
to path
in the list of files to make it the same as the argument name in a method that takes only oneAlso, your previous code was mostly correct; this is more of an improvement.
I also think the path
might be redundant as an API argument here because we can just send the file with the filename as usual. (I read the response in the envd infra PR, so not that sure here.)
Let's keep it in the envd API though, because one of the uses was that you can use it to ensure that the path people upload files to is fixated.
@ValentaTomas I addressed your comments and added extra tests for some edge cases.
One important thing to note, and I've added it as a code comment, is that we can't expect specified directories in path of multipart filename to be taken into consideration; I've tested it with and sure enough only file name is used as path, the rest of the path is stripped by the std lib's "mime/multipart"
when calling
pathToResolve = part.FileName()
One important thing to note, and I've added it as a code comment, is that we can't expect specified directories in path of multipart filename to be taken into consideration; I've tested it with and sure enough only file name is used as path, the rest of the path is stripped by the std lib's
"mime/multipart"
when callingpathToResolve = part.FileName()
Ok, this is a very good find — how do you think we should handle this? We want to be able to upload files with any path, but the stripping of paths might make sense to preserve, because it allows people to upload by link easily. This might require some changes to the envd.
One important thing to note, and I've added it as a code comment, is that we can't expect specified directories in path of multipart filename to be taken into consideration; I've tested it with and sure enough only file name is used as path, the rest of the path is stripped by the std lib's
"mime/multipart"
when callingpathToResolve = part.FileName()
Ok, this is a very good find — how do you think we should handle this? We want to be able to upload files with any path, but the stripping of paths might make sense to preserve, because it allows people to upload by link easily. This might require some changes to the envd.
If it's really important not to break the current API spec, we could add a custom field to the multipart dataform and look for it with some changes to envd
@ValentaTomas I started by adding multi file write support for sandbox_sync
to get your opinion on this approach before moving forward with _async
.
I went the OR
route when it comes to typing, as function overload is not natively supported in python. Thate being said, with some non-negligible overhead, it is kindof achievable, but im not enthralled by the idea, especially when thinking about previous work generating the api-ref automatically: i have a feeling it wouldn't play well with it.
For the overload I think you can use the same system as we already have with https://github.com/e2b-dev/E2B/blob/beta/packages/python-sdk/e2b/sandbox_sync/process/main.py#L106
It should be the same thing, right?
I also suggest naming and exporting all the types (from both SDKs). What do you say about having:
WriteData
— this is then union type for the data (string, bytes, etc)WriteEntry
— this is the type that contains both the path
and data: WriteData
fields. I'm thinking about what to do when you try to invoke write for multiple files and provide an empty array.
Logically, you might want to notify the user that nothing was written, but throwing an error might not be optimal. If you are generating the field to write, you need to explicitly check if the array is empty; otherwise, you will get an error.
In contrast to this, isn't writing 0 files a valid operation and you will also get an array with 0 results so everything is ok?
One important thing to note, and I've added it as a code comment, is that we can't expect specified directories in path of multipart filename to be taken into consideration; I've tested it with and sure enough only file name is used as path, the rest of the path is stripped by the std lib's
"mime/multipart"
when callingpathToResolve = part.FileName()
Ok, this is a very good find — how do you think we should handle this? We want to be able to upload files with any path, but the stripping of paths might make sense to preserve, because it allows people to upload by link easily. This might require some changes to the envd.
If it's really important not to break the current API spec, we could add a custom field to the multipart dataform and look for it with some changes to
envd
Yeah, I'm thinking that we should probably do this, because people are already using the Beta SDK.
I'm thinking about what to do when you try to invoke write for multiple files and provide an empty array.
Logically, you might want to notify the user that nothing was written, but throwing an error might not be optimal. If you are generating the field to write, you need to explicitly check if the array is empty; otherwise, you will get an error.
In contrast to this, isn't writing 0 files a valid operation and you will also get an array with 0 results so everything is ok?
From the perpective that there will likely be less control over which files, if any, are generated, your point makes sense. I will allow empty arrays
@ValentaTomas @0div
write
method is confusing
path_or_files: str | List[WriteEntry],
data_or_user: WriteData | Username = "user",
user_or_request_timeout: Optional[float | Username] = None,
@ValentaTomas @0div
- It's not completely clear to me how the usage looks like, @0div can you please share code snippets for JS and Python showing how to use the new API? I think especially in Python naming of these parameters in the
write
method is confusingpath_or_files: str | List[WriteEntry], data_or_user: WriteData | Username = "user", user_or_request_timeout: Optional[float | Username] = None,
// legacy single-file write method signature
sandbox.files.write("path/to/file.txt", "This is a test file")
// new overloaded multi-file write method signature:
sandbox.files.write([
{ path: 'test_file_1.txt', data: 'This the first test file.' },
{ path: '/test_dir/test_file_2.txt', data: 'This the second test file.'},
])
see js-sdk test for more examples
# legacy single-file write method signature
sandbox.files.write("path/to/file.txt", "This is a test file")
# new overloaded multi-file write method signature:
sandbox.files.write([
{ "path": 'test_file_1.txt', "data": "This the first test file." },
{ "path": '/test_dir/test_file_2.txt', "data": "This the second test file."},
])
see python-sdk test for more examples
I added @jakubno as a reviewer to check the Python SDK as he's the one with the most Python experience on our team
I would argue for splitting the methods, there are two different functions for the user. I think 90% of users will use the simple version for which I like the current signature (no need for custom objects). The only downside is, there will be 2 ways for users to upload data. But if we name this
multifile_upload
ormultifile_write
it shouldn't be confusing
I think we could even have just the multi-file support in both SDKs because if you want to write a single if, it's just an array with a single item. The question in my head is whether it's better DX of having a single method that has a clear arguments but user needs to always pass an array, even for a single file. Or if it's better to have separate functions for single file and multiple files write.
Alternatively, if we want to go Kuba's road with separate functions we could have:
sandbox.files.write()
- for a single filesandbox.files.write_multiple()
- for multiple files
Same in JS? I usually don't like the passing arrays.
When you pass an array the function will return an array and will it have only one element? should I check it has the element or should I risk index out of bounds error?
One more thing I want to discuss - WriteData
in JS SDK.
The WriteData
type is this:
export type WriteData = string | ArrayBuffer | Blob | ReadableStream
We're using it for example here:
export type WriteEntry = {
path: string
data: WriteData
}
WriteEntry
is a single item in multi file write - sandbox.files.write([{ path: string, data: WriteData }])
I'm thinking of maybe we should get rid of this type alias and have it directly as this:
export type WriteEntry = {
path: string
data: string | ArrayBuffer | Blob | ReadableStream
}
The reason would be to make it easier for users to immediately see what data
is in their type hints. Otherwise they need to do additional clicks and find what WriteData
is in our codebase.
I usually don't like the passing arrays.
When you pass an array the function will return an array and will it have only one element? should I check it has the element or should I risk index out of bounds error?
That's a good argument. I agree. I don't like dealing with arrays and checking the length as well
I also like the single upload method because it is something users already use a lot; requiring to always pass an array didn't sit well with me.
For splitting the methods — I really don't want to end up with more methods for the same thing on the same level.
export type WriteEntry = { path: string data: string | ArrayBuffer | Blob | ReadableStream }
I think this is the better solution here than the WriteData
I originally proposed.
I usually don't like the passing arrays. When you pass an array the function will return an array and will it have only one element? should I check it has the element or should I risk index out of bounds error?
That's a good argument. I agree. I don't like dealing with arrays and checking the length as well
For the multifile upload, you have to pass an array though, right?
EDIT: If I understand it correctly, this is an argument for having the single file upload there, right
I think this is the better solution here than the
WriteData
I originally proposed.
It would be great if we adopt this approach in general. With Typescript it's a thin line between using types to hide some abstraction and over-using types.
For splitting the methods — I really don't want to end up with more methods for the same thing on the same level.
I can maybe see us using the overrides in JS but I really don't like this write signature in Python:
path_or_files: str | List[WriteEntry],
data_or_user: WriteData | Username = "user",
user_or_request_timeout: Optional[float | Username] = None,
It's not clear at all how to use these methods just from looking at the code. We should really aim for trying to be as clear as possible just from the code.
It creates many questions in my had, why is data and user mixed together? Why is user and request timeout mixed together? It's very unintuitive
Just to clear up a possible confusion here, because when I checked the start of the discussion it might not be clear:
path_or_files: str | List[WriteEntry], data_or_user: WriteData | Username = "user", user_or_request_timeout: Optional[float | Username] = None,
This signature should never be seen by the user, because the method is
@overload
ed, so the visible method signatures will be:
This will be default and selected first if you start typing:
@overload
def write(
self,
path: str,
data: WriteData,
user: Username = "user",
request_timeout: Optional[float] = None,
) -> EntryInfo:
You can switch to this one explicitly or implicitly by passing an array (because of duck typing) as first argument or files
argument.
@overload
def write(
self,
files: List[WriteEntry],
user: Optional[Username] = "user",
request_timeout: Optional[float] = None,
) -> List[EntryInfo]:
Just to clear up a possible confusion here, because of the start of the discussion:
path_or_files: str | List[WriteEntry], data_or_user: WriteData | Username = "user", user_or_request_timeout: Optional[float | Username] = None,
This signature should never be seen by the user, because the method is
@overload
ed, so the visible method signatures will be:
- This will be default and selected first if you start typing:
@overload def write( self, path: str, data: WriteData, user: Username = "user", request_timeout: Optional[float] = None, ) -> EntryInfo:
- You can switch to this one explicitly or implicitly by passing an array (because of duck typing) as first argument or
files
argument.@overload def write( self, files: List[WriteEntry], user: Optional[Username] = "user", request_timeout: Optional[float] = None, ) -> List[EntryInfo]:
I see. I must have missed the @overload
, sorry. What does the user
option do? Is that new?
I'll leave @jakubno to add any notes if has anything additional. As I said - he has the most Python experience
This signature should never be seen by the user, because the method is @overloaded, so the visible method signatures will be:
It ain't true, users can see the source code for python libraries and I often go there
The user
is an optional argument that can be used to modify who you are making the filesystem operation as — if you need to delete a file owned by root or create a file that is owned by a different user that the default one.
We had a lot of confusion around the users/permissions, so having this explicitly visible and modifiable seems better than hiding it again.
EDIT: It is new in a sense that it is on all relevant methods in the Beta SDK.
This signature should never be seen by the user, because the method is @OverLoaded, so the visible method signatures will be:
It ain't true, users can see the source code for python libraries and I often go there
Well, I cannot argue with that. At that point isn't the usage clear?
EDIT: What I wanted to say — are you worried about the overload implementation being confusing to people inspecting the code?
The
user
is an optional argument that can be used to modify who you are making the filesystem operation as — if you need to delete a file owned by root or create a file that is owned by a different user that the default one.We had a lot of confusion around the users/permissions, so having this explicitly visible and modifiable seems better than hiding it again.
Is this only filesystem thing? Or on all methods now? My immediate thought would be to expect something like this on every relevant method.
We had a lot of confusion around the users/permissions, so having this explicitly visible and modifiable seems better than hiding it again.
I'm pretty sure this won't solve it. People will be still confused and won't know what to pass there. The solution is that users shouldn't need to mess with permissions by default. The filesystem should be accessible to the default user. I had very relevant feedback for this from our hackathon:
Problem uploading data to sandbox and not being able to open the file because of permissions. Had to use
sudo
. Everything was in/home/user
.
Would the pythonic *args
(Non-Keyword Arguments) be a good middle-ground here?
And its equivalent in TS function sum(...args) {
That way a dev can use one or more args or w/o using arrays.
Would the pythonic
*args
(Non-Keyword Arguments) be a good middle-ground here? And its equivalent in TSfunction sum(...args) {
That way a dev can use one or more args or w/o using arrays.
I think it would be slightly problematic (more so in TS) because we also want to pass the opts
for the request.
plus don't we lose all the type hints?
Is this only filesystem thing? Or on all methods now? My immediate thought would be to expect something like this on every relevant method.
Yes, it is on all relevant methods.
By default you don't need to mess with this, but when interacting with filesystem, not having this exposed would lead you to not being able to do this. So that's why I choose to do less magic here — the user you pass is used for the operation, there is default of user
that is being user, but you can override it.
Is this only filesystem thing? Or on all methods now? My immediate thought would be to expect something like this on every relevant method. Yes, it is on all relevant methods.
By default you don't need to mess with this, but when interacting with filesystem, not having this exposed would lead you to not being able to do this. So that's why I choose to do less magic here — the user you pass is used for the operation, there is default of
user
that is being user, but you can override it.
What are the use cases when you need it? It's a simple parameter but it opens a whole can of new questions like "how do I create a new user?" or "how do I switch to a different user when running a command?" that we'll need to provide answers for. That's why I'm not a big fan of this. It feels only half way there because it's missing all the technical content and utilities around it.
plus don't we lose all the type hints?
The variadic argument could be typed, but generally, it should be the last argument. Otherwise, it won't work.
Sorry to get in between, but for the Browserbase SDK I was in a similar situation, we ended up with a load_url
method to load a single web page, load_urls
to load multiple and a load
method to handle both (depending on input)
Sorry to get in between, but for the Browserbase SDK I was in a similar situation, we ended up with a
load_url
method to load a single web page,load_urls
to load multiple and aload
method to handle both (depending on input)
Thanks for the input. Would you do it the same way again?
I think I'm leaning increasingly more towards two separate methods. I'm not sure about having a third one to potentially handle both single and multi use case.
Would do two methods. My thinking was (still is): I want a function to return X. If a function should return Y, I make a function that returns Y. A function that returns both X and Y is flaky
@mishushakov Why you ended up with three in the end?
Because it felt like if we have the other two why not have it also. But tbh., the whole thing was out of necessity, because I didn't want to start a browser every time I want to fetch a URL, load_urls
would connect once and do the same thing as load
for multiple urls. I actually prefer the Node way - you can files.map > fs.writeFile > Promise.all(files)
Three sounds the worst :D
Let's do the two separate if everybody agrees:
write_file
/writeFile
write_files
/writeFiles
Three sounds the worst :D
Let's do the two separate if everybody agrees:
write_file
/writeFile
write_files
/writeFiles
I agree about separate methods but not sure about the naming because there's already files
module:
sandbox.files.writeFile() / sandbox.files.write_file()
sandbox.files.write_files() / sandbox.files.write_files()
Maybe feels a little weird?
Alternative namespace solutions:
sandbox.fs.write_file
(Not sure if it is clear that you should use this for filesystem)sandbox.filesystem.write_file
(This seems too long)sandbox.write_file
(Can we get rid of all namespaces without making everything confusing?)EDIT: I actually feel that maybe not having namespaces lowers the mental difficulty for me when using the SDK — you just type what you want to do as opposed to having to think in which namespace the method should be.
EDIT 2: Also I would not even argue for overloaded write
method without the files
namespace, because without namespaces you need to name it more specifically like writeFile
which tells you that it takes just one file.
Alternative namespace solutions:
sandbox.fs.write_file
(Not sure if it is clear that you should use this for filesystem)sandbox.filesystem.write_file
(This seems too long)sandbox.write_file
(Can we get rid of all namespaces without making everything confusing?)EDIT: I actually feel that maybe not having namespaces lowers the mental difficulty for me when using the SDK — you just type what you want to do as opposed to having to think in which namespace the method should be.
EDIT 2: Also I would not even argue for overloaded
write
method without thefiles
namespace, because without namespaces you need to name it more specifically likewriteFile
which tells you that it takes just one file.
Keeping a namespace or prefixing the method name could help disambiguate a bit since that kinduv io touches other linux abstractions.
I'm going with no namespaces. With Stripe SDK it makes sense to have namespaces as they have different "products" and the products have different methods available. We only have one "product" (or class) which is the sandbox, so we should only have sandbox methods on top-level, eg: sandbox.writeFile.
I'm going with no namespaces. With Stripe SDK it makes sense to have namespaces as they have different "products" and the products have different methods available. We only have one "product" (or class) which is the sandbox, so we should only have sandbox methods on top-level, eg: sandbox.writeFile.
The question is then is this a single product API or a VM (with its components) API?
Description
Filesystem.write
method to accept multiple filesenvd
supports multipart with multiple files out of the boxTest