Closed sntran closed 3 years ago
It should be uploading whilst it receives data from the process.
Note that read buffers are used, so if the download is particularly slow, it can take a while to fill. By default, it tries to read 1400KB (2x 700KB articles) before doing any uploads; you can change this behaviour with the --disk-req-size=700K
option.
Also check that rclone cat
isn't impeding the process with its own buffering.
The reading line in the output indicates that it's begun reading the file. The progress indicator shows how much has been uploaded, so if that's moving, it should be progressing. You can also use the -v
switch which will display every article uploaded once it's been posted.
You can also try reducing both --disk-req-size
and --article-size
to something really small to check that it's not trying to download the whole file before uploading.
Hi,
I don't think the download speed is slow. I usually get around 20MB/s downloading. I think the upload is the bottleneck, as it seems to be around 9MB/s.
However, there was no progress indicator. The last line was that "Reading file" and it stuck there.
I may need to note, but the file was 3GB, so even with --disk-req-size=1400K
, it should continue reading.
I understand that it may be hard for you to debug when you don't use rclone
. If you're willing, I can see if there is a way for me to set up a test project for you to try.
Right, thanks for the info.
Are you able to check that the download is actually occurring (e.g. rclone isn't halted waiting on user input)?
You may be able to check with something like:
rclone cat ... | nyuu 'procjson://"filename",100,0' ...
(the 0
there indicates stdin)
...which doesn't suppress stderr/stdin of rclone
I just tested rclone cat
with a large local file off a slower drive, and can see it progressing as it reads the file, so it seems to work there. I can't test with a Google Drive backend though (account got banned).
If you want to try taking rclone out of the equation, you can try a wget/curl command with a URL like http://cachefly.cachefly.net/100mb.test to see if it seems to progress with that.
Hi again,
Yes, it seems to be a false alarm. rclone
did download, and nyuu
did upload. I believe I only saw the line [INFO] Reading file Big_Buck_Bunny_4K.webm
without the progress indicator is because I used Node.js to spawn nyuu
and pipe its stderr
to my stream, which does not handle progress-style output.
Which is kinda suck, as nyuu
progress bar provides lots of information. I suppose it only works when connected to a real terminal. I have checked the --progress
options, and none of the other options works for me. But that is beyond the scope of this issue.
Closing the issue. Thanks for explaining!
Thanks for finding that and reporting back!
If you have suggestions on what could help regarding the progress indicator, feel free to mention them.
I'm not sure how I could be of any help, since I am not very familiar with the way the console outputs text.
FWIW, I am using Node.js readline
module to read the lines from stderr
, where nyuu
outputs progress. I think the way that you handle the progress bar is not read correctly or ignored by readline
. That's probably why I stopped seeing further line after [INFO] Reading file Big_Buck_Bunny_4K.webm
.
As far as I understand, readline
looks for \n
, \r
, or \r\n
to determine a line. Looking at these lines, I don't see any of those end-of-line characters.
This is where I stopped, as I have no clue on the way the cursor works. However, if there is a way to determine whether a terminal is not attached to stderr, and switch to using \r\n
, that may get the progress bar to work with readline
. It will probably output repeating progress line, but it's fine, as I can handle it.
By default, Nyuu doesn't spit out any progress if it's not connected to a terminal. The idea being that terminal escape codes don't mean a whole lot outside the terminal (particularly if you're redirecting the output to a file).
Perhaps you might want to try --progress log:1s
which outputs a line of progress every second. Alternatively, you could use the TCP server and query the status for progress.
Unfortunately the output Nyuu generates is meant to be human read, so isn't the most friendly to being parsed.
Right. Thanks for walking with me through the trouble.
I tried the TCP option first, but setting up a TCP socket that keeps polling the server was a little too much for my script. Not a big deal, but parsing the log is easier :) I could got the data I want from the log. Thanks again!
Not sure if I should open a new issue for this, but I think it would be better if you can also support another option for --progress
that can take a template string for the log. For example, the default would be:
"Article posting progress: {articlesRead} read, {articlesPosted} posted, {articlesChecked} checked"
These variable replacement can be the properties of the uploader. It would also be helpful if we can get the average transfer speed from it.
I can look into a PR for it if your hands are full :)
Oh yeah, log doesn't display totals/percent progress.
If the aim is to have it read by an application, it probably makes sense to adopt something more designed for parsing (like outputting everything in JSON). A template formatting string just seems like something that is only applicable to a specific scenario.
Sound like it could work for you?
JSON is easier to parse, but it will need to include all the data.
Template formatting string allows the user to choose what they want to include.
but it will need to include all the data.
Not sure if I missed something, but is that an issue?
Sorry, I didn't mean it as an issue. But personally, if I plan to read the log from an application, parsing a full JSON blob every second or so through stderr
seems wasteful, when I mostly will just convert it into a progress line. But that is just my use case. I have no objection to having JSON.
The idea is not to be restricted to the use case of a single application. For example, if someone wanted to create a GUI, then text is likely not what they want.
JSON parsing is at least tens of megabytes per second - the difference between 100 or 1000 bytes is like less than a fraction of a millisecond. Yes, there's some waste, but it's largely irrelevant, particularly in comparison to the cost of IPC.
Thanks for the suggestion nonetheless!
Hi there,
First, thank you so much for making this library, it works really well.
Recently, I need to upload a file from a Google Drive. I use
rclone
to handle the remote storage.I was thinking of using this for the input to
nyuu
:procjson://"[name]",[size_in_bytes],"rclone cat 'drive:path/to/file'"
In which
rclone cat
is a command that pipes the content of a remote file tostdout
. The size of the file can also be calculated byrclone
, so that variable is fine.However, from the look of it, it seems that
nyuu
will try to read the whole file first before starting the upload. This is extremely slow since the file is basically downloaded fully first.I assume that's the case, based on this log line, which stays on screen for a while:
[INFO] Reading file Big_Buck_Bunny_4K.webm
Is that the default behaviour of
nyuu
? Can we make it so that it will start uploading immediately as the file is being downloaded from the remote?