dominictarr / split

MIT License
347 stars 39 forks source link

Improvement: EOF or 'finish' event #29

Closed Vandivier closed 6 years ago

Vandivier commented 6 years ago

consider the following use case, modified for simplicity:

function fParseTxt(sLocation) {
    const regex = new RegExp(EOL);
    let rsReadStream = fs.createReadStream(sLocation);

    rsReadStream
        .pipe(splitStream(regex))
        .on('data', fHandleData)
        .on('end', fEndProgram);
}

function fHandleData(sLineOfText) {
        wsWriteStream.write(sLineOfText + EOL);
}

function fEndProgram() {
    console.log('Program completed.');
    process.exit();
}

This program will exit before write operations are completed.

One way to solve this issue: split checks if there is a next chunk; if not, it emits a special 'end of file' event

dominictarr commented 6 years ago

you don't need to do process.exit() node will exit on it's own once the event loop is empty. If your node program is staying open, there must be something else happening...

If you really want to do it this way, wsWriteStream is what should have the finish event, not split

Vandivier commented 6 years ago

I'll go double check and provide a repro, but I believe I tested this before filing.

I recall wsWriteStream's finish was emitted many times, where the earliest came before rsReadStream's end and several more came after rsReadStream's end.

dominictarr commented 6 years ago

if it does that, the ws stream is wrong, not split. file an issue there

tswaters commented 5 years ago

Using just split to open a large file and iterating over each record.

I also open up a database connection and do something with each line.

Ideally, I'd like an end event to close the database connection so the node event loop can close cleanly.

However, end is fired almost immediately and calling client.end() (its pg) means I can't use the database connection during processing.

Is there a way to know with this library if the processing is done?

pseudocode:

(async () => {
    const client = new Client()
    await client.connect()

    fs.createReadStream('./var/some-large-file')
        .pipe(split())
        .on('data', async function (line) {
            try {
                await client.query(getQueryFromLineBuffer(line))
            } catch (err) {
                logger.error(err)
                await client.end()
                process.exit(1)
            }
        })

//  todo: figure out how to close client :(

})().catch(err => console.error(err); process.exit(1))
dominictarr commented 5 years ago

If I was doing that, I'd have two variables - streamEnded and processingEnded and only call process.exit when both of these are true. you don't really know if one ends before the other, so check them in both the stream's end and processing end.

tswaters commented 5 years ago

Sorry, this isn't very clear to me. Is there a hook I can use to set value processingEnded?

I imagine streamEnded would get set to true in the end event, but I'm not sure about processingEnded -- the only function I have is the data event which is fired once for each line. If I don't know how many lines there are, how can I know when processing has ended?


(async () => {
    let streamEnded = false
    let processingEnded = false

    const client = new Client()
    await client.connect()

    fs.createReadStream('./var/some-large-file')
        .pipe(split())
        .on('data', async function (line) {
            try {
                await client.query(getQueryFromLineBuffer(line))
            } catch (err) {
                logger.error(err)
                await client.end()
                process.exit(1)
            }
        })
        .on('end', () => {
            streamEnded = true
            checkDone()
        })

    // todo: figure out when to set `processingEnded` to true

    function checkDone () {
        if (streamEnded && processingEnded) {
            client.end()
        }
    }

})().catch(err => console.error(err); process.exit(1))