async Init file function: streams and fileAdded events

drzraf commented 3 years ago

This is a follow-up of https://github.com/flowjs/flow.js/pull/296 In that PR, support was introduced for async initFileFn which allows to read from a stream and/or perform other async operation before Flow.js proceed with a given file.

One very important aspect of the FlowFile bootstrap is size determination (by default, from the regular file.size) in order to compute the number of chunks needed. A initFileFn could affect this computation.

FlowFile initialization is followed by the fileAdded event which serves two purposes:

As a hook, which allow consumer to reject this file
As a notification, which inform consumer that bootstrap is fully done and further processing [eg: flow.upload()] can be called (Flow.upload() would fails if the FlowFile.chunks array was not computed)

One issue is that an async initFileFn could finish after fileAdded fires. So one question is: Should the initFileFn function really run if a file is going to be rejected?

If yes (eg, because rejection depends upon the number of chunks/file.size), then a PR could be made so that the event is fired after initFileFn (even if async). It would move fileAdded inside FlowFile's bootstrap function unless a nicer and higher-level solution exists (like of more pipeline-like vision of the process). I tend to think there should be less assumptions about file.size at the beginning of the process (eg: FlowChunks could be built on-demand later) but this hook in its current shape forbids this.

Side note about Events: I think that having events to expect a return value makes them more hooks than events. It's a bit counter intuitive and keeps the code from moving to native Events. Not sure they are worth it. The fileAdded native event could just pass the corresponding (read-only) reference to Flow and the FlowFile and the consumer could delete it as well (accessing to the global flow object). It'd be more flexible.

AidasK commented 3 years ago

Agree, initFileFn should not cross with any event. One should wait for other to complete. Feel free to break something as long as it makes it cleaner.

drzraf commented 3 years ago

I tried hard to provide a publishable PR where addFiles would have supported both async Promise-mode and non-async (regular) calls (using 4 asyncReadFileFn, asyncInitFileFn, readFileFn and initFileFn helpers).

But this implied crazy code complexity and messy method prototypes (Promise-based / await use vs regular blocking calls + async vs non-async bootstrap()). I ended up getting tests messing around one with the other or strange endless loop behavior.

Back on the drawing board, I'll probably propose a new PR iteration introducing only following prototype:

addFiles([files], initFn) // The backward compatible/unchanged one
asyncAddFiles([files], initFn) // The new one allowing stream/asynchronous init function where lays the related complexity in terms of events and hooks ordering

For upload(), we don't actually need to change anything but it may be clever to provide an asyncUpload({Function asyncReadFileFn = null, Function readFileFn = null}) or similar so that users:

can just await asyncUpload() which would be equivalent to listening to the complete event.
could override the readFileFn and use the blocking (or non-blocking) helper variants. It's handy and makes clearer that the read helper override only applies to upload().

AidasK commented 3 years ago

Maybe you could drop backward compatibility and only support promise based one? addFiles([files], initFn) // The backward compatible/unchanged one will be replaced with this one asyncAddFiles([files], initFn) // The new one allowing stream/asynchronous init function where lays the related complexity in terms of events and hooks ordering

Supporting both variants without a BC adds some complexity too. So why not break this and reduce our code base? It will be easier to read ant less likely to make a mistake. You can delete entire test too and rewrite it by your needs. Yes, it might work differently, but it's ok if it does it's job

drzraf commented 3 years ago

But that would force users handling regular files (the majority) to use a Promise-based approach with no gain:

Without an await / then they would have a surprising behaior
It would force them to reorganize their code
It would sound tricky is break common assumption

Given that the async approach is mostly useful for streams (whether network or local) and these are not going to represent the majority of cases, it seems safer to keep the regular function (node.js fs module is providing operation like statFile / statFileSync too)

AidasK commented 3 years ago

I don't think anyone currently using flow.js is going to upgrade to v3. v2 is already stable and they don't need to upgrade if it works. v3 is for new folks and it should be simple for them to get started.

Though you have the point, we can have sync and async separate methods. I leave this decision for you. You have my support in both cases now 👍

evilaliv3 commented 3 years ago

I'm in line with @AidasK in general.

With this change we will be dropping support for IE11 that anyway will be officially dead in July.

What about @ng-flow and @ngx-flow? Shall we maybe issue two versions updated to support v2 and v3 respectively? Or maybe we could keep ng-flow as is and port ngx-flow to v3?

@MartinNuc what do you think?

AidasK commented 3 years ago

ng-flow will be left with v2 version, since angular v1 is quite dead as well. As for ngx-flow, we still need to finish v3, it's a bit early to integrate it.

evilaliv3 commented 3 years ago

Sound good for me to proceed with this plan :)

drzraf commented 3 years ago

Sorry for going off-topic too but:

For a side project I'm currently using ngx-flow (and intend to keep it working with v3).
One very interesting development would be a Vue component. But this needs more time (to be done right) than I currently have myself.

About the full picture, a user made a Flow.js fork (I'd say "offensive": no git history, dropped authors & licensing) then created a Vue component which only works with his fork due to API changes. Still that could be a source of inspiration.

AidasK commented 3 years ago

I wasn't aware of it and it's quite popular. Too sad he hasn't contacted us before making this fork, we could have done an awesome collaboration.

For sure any inspiration is ok and should be used. I have added @drzraf as a member of the whole @flowjs/developers team, so you should be able to manage every repository. Feel free to create a vue js one if you want. 🍾

For bright flow.js future I have promoted @evilaliv3 to the owner. More of use are involved, the better 🏅

drzraf commented 3 years ago

I'd like to further divide events, inspired by WordPress model with its filter vs action hooks.

Why so many hooks? Mostly because of user relying on dropzone and other higher-level components. In these cases the flow between addFiles and upload is hardly configurable by the user contrary to those able to await flow.asyncAddFiles(); flow.upload();
Most events act as filters : They expect to stop processing at the first false value and the code itself may disregard the boolean return value (making them events).
There is no thing such as action where a Flow(File) property is altered by the processing hook.

The distinction between

return true/false to continue processing
return (possibly) altered data passed as a parameter .... is, I think, necessary. A possible use case could be for filesAdded to return a reordered file list (eg: based on size) what is not doable with filter-oriented hooks alone.

Do you approve such a road? We would end up with:

Type	Sync version	Async version	Stage	Params	Notes
Filter	preFilterFile	asyncPreFilterFile	addFiles	`file, event`
Processing hook / Action	fileAdded	asyncFileAdded	addFiles	`flowfile, event`	Can be used as a "post-initialization" file filter : `delete flowfile.file` will dequeue that file
Processing hook / Action	filesAdded	asyncFilesAdded	addFiles	`flowfiles, event`
Processing hook / Action	filesSubmitted	asyncFilesSubmitted	addFiles	`flowfile, event`
FlowFile initialization	initFileFn	initFileFn ¹	addFiles	`flowfile, event`
FlowFile reading	readFileFn	asyncReadFileFn	addFiles	`flowfile, startByte, endByte, flowfile.file.type, flowfile`
Event	fileRemoved	²	removeFile	`flowfile`
Event	fileSuccess	²	upload	`flowfile, message, last-chunk`
Event	fileError	²	upload	`flowfile, message, last-chunk`
Event	fileProgress	²	upload	`flowfile, chunk`
Event	fileRetry	²	upload	`flowfile, chunk`
Event	uploadStart	²	upload
Event	complete	²	upload
Event	progress	²	upload
Event	error	²	upload	`message, flowfile, chunk`
Event	catchAll	²	*	`event-name, ...other arguments`

¹ Whether or not initFileFn should return a Promise and is awaited for depends on whether addFile(s) or asyncAddFile(s) was called. ² Events are always async in the sense of fire and forget style.

Rule is : If any of initFileFn or any of the hooks is async, the whole addFiles must be async, that is, asyncAddFiles must be called instead. This could be detected with callback.constructor.name === "AsyncFunction" (for non-transpiled code) but the user (or the upper layer) is the only one to know what to do from the Promise and that's the reason the public asyncAddFile(s) method must be introduced.

Do you approve such an approach and pursuing the PR in that direction?

AidasK commented 3 years ago

Great table. I think it makes much sense and it could be added to the docs too! Great explanation :)

drzraf commented 3 years ago

In the original post I mentioned moving to native events which means extending the EventTarget class so that the Flow object inherit addEventListener() and can call this.dispatchEvent().

This seems great to replace fire and forget-like events when a return value is not needed. These can be safely dispatched within an async or a setTimeout) and it's now easy to make any object targetable.

If we keep compatibility with .on(), then the addEventListener callbacks must be wrapped (so that listeners receive the same data as before, instead of a CustomEvent annd it's detail property). But wrapping callbacks makes harder to track (and ultimately remove them). [Always that same problem with the native ES Event API].
Breaking compatibility in favor of native CustomEvent (for the fileRemoved, success, ...) means Flow v2 event handlers signature would not fit anymore. That's a big breakage.
It's possible to overload EventTarget.prototype.addEventListener ending up with this kind of code but it's still feel dirty and outside the scope of Flow.js.
An ideal emitter one would:
1. Rely upon CustomEvent
2. Provide a mixin (easy to extends)
3. Add the necessary removeAllEventListeners and/or wildcard-like event removal semantics (what means tracking listeners)

I went over these libraries:

Some of them work by having on() to return the actual callback being registered so that it can later be consistently removed. They also handle all listeners tracking and offer handy shortcuts for fully or partial removal. But none of them rely upon native CustomEvent under the hood (except the last which is suboptimal regarding other aspects) : They all just synchronously run callback code. I wonder what am I overlooking.

Unless you've some opinion on the subject, I'm likely to go with the EventTarget.prototype.addEventListener overloading option. By the way, I'm about to split all the event-handling related code into a distinct file to make Flow.js code more focused and readable.

Note: I also feel it'd be good to distinguish between events and hooks by changing the way to attach listener. Could be on() for events and hook() or listen() or attach() or addAction() for hooks.

AidasK commented 3 years ago

I would be against overloading. Let's assume that most of our users are not going to upgrade to v3, so we can just drop support for on/off methods. Anyway it's a simple replacement of on to -> addEventListener, it's not that hard. We can mention this in the changelog/upgrade guide. Library will be smaller without overloading and simpler to use.

So I would go for plan 2.

AidasK commented 3 years ago

As for distinguish of events and hooks. Why not moving hooks to params? Do we really need to bind multiple handlers for one hook?

AidasK commented 3 years ago

The other reason why I want to make breaking changes for this code is that flow.js to be future proof. Of course it's going to break ngx-flow integration, but that's it. We have no other dependencies.

New flow.js should be simpler, easier to use and should cause less confusion. By dropping overloading it's for sure going to be simpler

flowjs / flow.js

async Init file function: streams and fileAdded events #319