Azure / azure-storage-azcopy

The new Azure Storage data transfer utility - AzCopy v10
MIT License
607 stars 219 forks source link

Do we need to write logs to `~/.azcopy` directory? #221

Closed ppanyukov closed 9 months ago

ppanyukov commented 5 years ago

Which version of the AzCopy was used?

/azcopy_linux_amd64_10.0.7/azcopy

Which platform are you using? (ex: Windows, Mac, Linux)

Linux

What problem was encountered?

Logs are written to ~/.azcopy directory as per:

Log file is located at: /home/ppanyukov/.azcopy/99599e85-8ef6-d844-595e-ca9c12a8f762.log

I would say this is not what programs on Linux normally do. Certainly writing logs to a dot directory in user's home feels very unorthodox.

My preference would be to write to stdout and stderr as appropriate by default, with a possible option to specify --log parameter or similar to override this (not even sure this is necessary as I can easily redirect stdout/stderr to wherever if I need to).

What are people's thoughts on this?


Possible hacky workaround for this: run azcopy in a docker container, this will make sure logs don't get stored anywhere locally for a long time. On the other hand, if logs are required, this running in docker surfaces other issues.

zezha-msft commented 5 years ago

Hi @ppanyukov, thanks for reaching out!

I understand and appreciate your perspective. However, we'd prefer to save the log as a file, since there could be a lot of useful information, if any transfer ever goes wrong. If we accepted a --log-file-location flag, what if the user doesn't provide one? One option would be to save the log file in the current directory, from where the user is running the command; but this is not a great practice either since the user could be in the directory were they are trying to upload data from, and putting the log file there is undesirable.

You could customize the log location. Please refer to ./azcopy env.

ppanyukov commented 5 years ago

Well, I'm sure this is done out of best intentions and I agree it's a good thing to have logs, but also programs for Linux should probably strive to fit into what is customary for Linux. As I pointed out, I think it is more standard to write to stdout/stderr rather than in dot directories in user's home such as ~/.azcopy.

Why not take rsync as a model for a tool that does pretty much the same kind of job and is well established? That is not to say rsync is ideal, it's what people probably used and are likely to be familiar with.

BTW, I just discovered there is also ~/.azcopy/plans directory which keeps growing. Is this something that needs to be kept after the job finished?

Oh, and if we had --log-file parameter or env var AZCOPY_LOG_FILE_LOCATION (rather than directory), we could give it /dev/stderr and that might kinda solve this.

zezha-msft commented 5 years ago

Hi @ppanyukov, thank you for the suggestions!

~/.azcopy/plans contains the state files that allow AzCopy to resume failed jobs. They also allow the user to list all the jobs that ran in the past and query their results with ./azcopy jobs list and `./azcopy jobs show [job-ID]. We currently do not have a strategy to get rid of these files, as we don't know how long the user wants to keep records of their old jobs.

I understand your concern, but it may be challenging to change the logging to output on stdout/stderr, since we already have interactive outputs that give the user progress updates. Logs are critical in helping our customers to investigate issues, as they could be very verbose and offer loads of useful information.

We can certainly add some kind of clean command that gets rid of these logs and plan files. Would that be a reasonable solution for you?

ppanyukov commented 5 years ago

So, if we run sync command every 2 minutes or so, there will be quite a few "plan" files left behind?

I appreciate there may be several design goals here, and logging in general is not an easy thing.

Adding clean command does not seem to achieve anything more than me running rm -rf ~/.azcopy after every run unless I'm mistaken?

Well anyway, I'm probably not super fussed about this issue right now, considering the rm -rf workaround and we are not using this tool anywhere near production or anything serious yet, although it's on the table as one of the options.

zezha-msft commented 5 years ago

Hi @ppanyukov, thanks for your feedback! We'll look into improving this part of the user experience.

The reason why I suggested clean is because the user could specify another location for logs through an environment variable. It'd be pretty convenient to just call azcopy to remove both the logs and plan files.

What if we provided a flag to indicate that we should delete the logs&plan files if the job succeeds without error? Would you prefer this behavior?

On a side note, we've worked on shrinking the size of the plan files, which will be out with the next release.

arnobf commented 5 years ago

@zezha-msft for our use case, a flag to delete logs&plan files in case of success - as you mentioned - would be very welcome.

jtmoree-github-com commented 5 years ago

+1 on not writing files to disk and expecting the user to figure out where they are and clean them up later.

we'd prefer to save the log as a file, since there could be a lot of useful information, if any transfer ever goes wrong. If we accepted a --log-file-location flag, what if the user doesn't provide one?

If the user does not specify a log file location it's because the user doesn't want the log files filling up his disk. Forcing the user to accept log files forces everyone to write extra code to handle a situation which should not exist in the first place. We dont need to force every person using this utility to deal with files they don't need. When the support call comes in we would tell the person to use the log option and send us the output. This is standard practice for well written software.

Furthermore, the environment variable doesn't affect the plans folder and files. They are still written to the users home folder which means using the env var increases the amount of work everyone has to do. Now we have to delete files from two locations.

If we cannot stop the log and plans files in the short term a 'clean' option is the next best thing.

JohnRusk commented 5 years ago

Thanks. As noted in #259, we're aiming to provide a cleanup option and to make various other improvements. Thanks for pointing out the issue with plan files not also moving. I've made sure that's also going to be considered.

kovas6 commented 5 years ago

I have to copy 4TB small files between two storage accounts and having all that logging you force is absolute nightmare, I had to create extra disk for which I will be charged just in order to keep those logs I would never read

adreed-msft commented 5 years ago

Hi, kovas6. We're trying to reduce the logging at the moment. If I catch some spare time, I'll look into adding a global flag into AzCopy to turn logs off.

JohnRusk commented 5 years ago

@kovas6 (cc @adreed-msft). You can already turn the logs off. Just put this on the command line:

--log-level NONE

(Although I recommend using --log-level ERROR instead. That doesn't eliminate logs, but it does keep them very small).

kovas6 commented 5 years ago

another problem 30gb of PLANS folder and still it keeps growing while I am copying data between storage accounts. I am running out of disk space on C drive. Any way we could disable those plans or at least move them to another folder/drive?

jtmoree-github-com commented 5 years ago

@kovas6 (cc @adreed-msft). You can already turn the logs off. Just put this on the command line: --log-level NONE

--log-level NONE and --log-level=NONE act exactly the same as log-level ERROR. azcopy still outputs plan files and a log file including the same content as when ERROR is set as the level.

Furthermore, using azcopy --help says nothing about the log-level option. google finds nothing accurate about the log level options. The README in the azcopy source code says nothing about log levels.

This page mentions log levels and does not list NONE but since it does not work that is a bit moot. https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-configure

JohnRusk commented 5 years ago

Oh, I'm surprised that NONE didn't work! Thanks for letting us know.

JohnRusk commented 5 years ago

@kovas6 Yes, we're aware of the plans issue and it's on our list to make it moveable.

jtmoree-github-com commented 5 years ago

@kovas6 Yes, we're aware of the plans issue and it's on our list to make it moveable.

"make it movable" ? does that mean that plans files will still be written even if the NONE issue is fixed? Is there an issue/bug/item for it?

jtmoree-github-com commented 5 years ago

Oh, I'm surprised that NONE didn't work! Thanks for letting us know.

shall I open an issue for NONE?

p.s. if you didn't know it was broken then there must not be a test case for it. I see some test code in the source. Is the test suite maintained and run as part of development and release? There is GO and PYTHON code. is the testing spanning two different languages?

JohnRusk commented 5 years ago

Yes, we do have two test suites: currently unit tests are in Go and integration tests are in Python. It does appear that the NONE case was not tested for. I think we'll track the NONE issue against this issue here, so no need to open another.

JohnRusk commented 5 years ago

does that mean that plans files will still be written even if the NONE issue is fixed? Is there an issue/bug/item for it?

There's no way to not write the plan files. They're essential to the way AzCopy works. At first glance, that may seem like a bad thing. However, counter-intuitively perhaps, I believe it's actually a good thing. Before joining the AzCopy team, I spent a couple of years on an experimental tool that did not use plan files - instead it stored the equivalent information only in memory. I found that that caused more problems (e.g. around reliability, retries and restarts; and also around supporting high file counts with low memory). So, plan files are the best known option. But there are two things we do need to work on: (a) making them moveable and (b) better cleanup of the plan files from completed jobs. We have work planned for both of those.

adreed-msft commented 5 years ago

Hi there, while this doesn't totally fix up this issue, log-level none is now fixed and present in AzCopy v10.2.x. Work is still planned for moving plan files and having better cleanup options. Enjoy!

JohnRusk commented 5 years ago

A settable plan file location, and azcopy commands to cleanup old files, are coming in version 10.3

SteveBurkettNZ commented 5 years ago

Docs needs an update to reflect the NONE option?

Also noted the in-app help text (azcopy cp --help) doesn't mention the DEBUG, PANIC or FATAL options that the documentation does.

JohnRusk commented 5 years ago

@normesta Any chance of squeezing the addition on NONE into the docs? (If too late for this release, I think that's OK too).

normesta commented 5 years ago

@JohnRusk - of course. Seems like a simple add to the docs. I assume NONE means no logging at all?

JohnRusk commented 5 years ago

@normesta Correct. But...

… the log-level parameter only controls the logging that we do to the AzCopy log file. There is also lower-level logging, of network errors and slow network calls, to the Event Log (windows) and Syslog (Linux). Maybe it would be worthwhile pointing out that that logging happens regardless of the setting of the --log-level parameter.

JohnRusk commented 5 years ago

BTW, Steve, I suspect that PANIC and FATAL might not actually work. (they come from a library we refer to, but I'm not sure AzCopy actually implements them.). We should check, and correct the documentation if it is the doc which is in error.

JohnRusk commented 5 years ago

A settable plan file location, and azcopy commands to cleanup old files, are coming in version 10.3

Version 10.3 is now released. Details are here: https://github.com/Azure/azure-storage-azcopy/releases/tag/v10.3.0

jtmoree-github-com commented 5 years ago

I disagree that this issue is resolved. at the very least that the resolution is not better than the situation before.

Given that azcopy left plan files sitting around before version 10.3 I was already cleaning the files up myself by extracting job information from the output. If I understand the new feature I will still have to extract the job ID from the output of azcopy and take extra steps to get the files cleaned up.

I cannot use jobs clean if I am running multiple jobs concurrently and because that would trash the other jobs? Am I correct in my understanding of the situation?

I thought I recalled discussion of a switch to azcopy which would tell it to clean up all files when it was done with the current run. That would be the cleanest way to handle the plan files.

adreed-msft commented 5 years ago

We did have discussion of that (namely cleaning up all azcopy-created job-related files on a job success), and I'd implemented it a while back. If I recall correctly, we ended up deciding that we'd talk about how to tackle the job cleanup strategy in the future.

@zezha-msft and @JohnRusk, thoughts on re-implementing that cleanup strategy? It was an opt-in solution, rather than an opt-out solution, which means that if a user took issue with their transfer with the flag on, they could just re-run it with the flag off and we'd have debug info.

jtmoree-github-com commented 5 years ago

Could you clarify for me? What did you implement? Is there a switch that will tell azcopy to clean up after itself for one job? or are you referring to a feature which was never integrated? or is this the current jobs clean feature?

JohnRusk commented 5 years ago

It was not implemented. One of the reasons was that this, "if a user took issue with their transfer with the flag on, they could just re-run it with the flag off and we'd have debug info", doesn't really work for users with very long-running jobs. E.g. if you spent 6 hours running AzCopy, have a question about it (but it's cleaned up the log because it thought it succeeded) you're not going to want run the 6 hour job again to get a log.

Even for short running jobs, we're concerned about what effect that model would have on users. E.g. we get a number of performance-related support requests. The job didn't necessarily fail, but just ran slowly. With automatic deletion on success status, those become very difficult to offer support for, because the log gets deleted.

We haven't totally ruled out some kind of automatic deletion in the future, but we want to make sure that we find ways to do it that don't compromise supportability. Therefore, in 10.3.x, we've added jobs clean (and jobs rm) but have not added automatic deletion.

JohnRusk commented 5 years ago

I've created issue #693 to track discussion around automatic clean up. Let's continue this comment thread there.

I'm going to close this issue, #221, now, because the original issue here was more about the location of the log files, rather than automatic cleanup.

bbrummer commented 4 years ago

~/.azcopy/plans is an absolute train wreck.

I just got through fixing a crashed server because this folder filled with thousands of tiny files blew out all the inodes on the filesystem, effectively filling it (despite tons of free "space").

WHY are ALL Azure tools such a complete and total trainwreck?! You don't have the most basic functionality working sanely...don't force magic misfeatures like "resume" down our throats especially if you don't know WTF you're doing (especially on Unix).

News Flash: CLI tools get used in cron scripts...that get run over and over and over and over. Must we REALLY be forced to build a full Docker container to run azcopy inside of because it can't even be trusted to not crap all over itself?!

Update: And I just realized why the *.log files aren't causing me the same issue. They were...I just forgot about hitting that mis-feature before and working around it the same way. Here's my code:

# God damn broken Azure tools...make a damn log file EACH COMMAND and burns out all the inodes on the disk!!
rm -f /var/lib/jenkins/.azcopy/*.log
zezha-msft commented 4 years ago

Hi @bbrummer, we are sorry to hear that you did not have a good experience.

To clarify, you can use azcopy jobs clean to remove the logs and plan files.

Please keep in mind that the tool supports many scenarios, including the ones where tons of data are transferred (imagine a million files at once), and it's very possible that a few transfers fail due to network conditions or other unpredictable reasons. Thus, it's actually essential to persist the job status and have the resume feature built in, so that in case of crashes, e.g. someone turned off the VM accidentally, the user can recover their progress.

That being said, we do realize that sometimes the plans/logs create a problem for people who run the tool over and over again. We have this item logged in our backlog, and will tackle it when priority permits.

Please kindly remain courteous so that we can have a productive conversation, and take your feedbacks into consideration in the future iterations of AzCopy.

espentveit commented 4 years ago

Feedback: Wouldn't an easier solution to have a toggle that indicates if it's a long running job or not? Azcopy seems like it's supporting two workloads. "Large transfer/sync with report" and "curl". Indicating that the action is a "curl"-type job would turn off logging to disk, plans and show output of failed items directly to console. "Large transfer with report" would give all bells and whistles. Maybe release a variant/alias like "azcp" for smaller jobs.

JohnRusk commented 4 years ago

Thanks for the interesting and thoughtful feedback @espentveit.

micktion commented 3 years ago

I've never seen a command line program that generates log files like this. Output should be to standard output, if you want verbose logging there should be a switch for that. If you want to create a log file you should simply redirect the standard output to a log file.

perspark commented 3 years ago

I'm using a Microsoft hosted build agent with no access to the file system, hence I cannot se any logs from AzCopy. Extremely strange design decision to log to file 👎

zezha-msft commented 3 years ago

Hi @perspark, you can customize where logs are stored. If you are referring to Azure DevOps build agents, you should have access to some location on the file system at least.

And to clarify, the log is for verbose info to debug issues, the stdout is the concise info that shows progress and job status.

dhirschfeld commented 3 years ago

My azcopy jobs run in an ephemeral container so any files created are destroyed along with the container. The stdout and stderr are logged but with the debug logs sent to a file it seems I have no opportunity to debug any failures?

JohnRusk commented 3 years ago

@dhirschfeld Can you wrap the call to AzCopy in a little shell script, that just takes the log file and copies it to stdout or wherever you want. I'm not sure if you can predict it's name (I've moved off the AzCopy team some time ago so my memory of log naming is hazy). But, in an ephemeral container, it will be the only file in the log directory.

dhirschfeld commented 3 years ago

Yeah, that's basically what I've resorted to - cat'ing the log file after the process ends :/

The fact that there's an ugly workaround for a limitation of the tool shouldn't prevent the tool being fixed though.

I haven't seen any argument here why logging to stdout as an option shouldn't be supported. The main line of reasoning seems to be that the logs are needed for support however, in my situation this limitation actually prevents me from being able to send the logs to support. Logging to stdout doesn't need to be the default and if set AzCopy could print a warning saying the logs aren't being persisted for future support.

It's then the user's choice to choose where to send the logs, and if they choose to send to stdout without also tee'ing to a file then they'll simply have to re-run the job to capture the logs if they want support. If that takes them 6 hours, so be it - they were warned.

The tool's responsibility for preventing users from shooting themselves in the foot should stop with warning them about dangerous actions - it shouldn't prevent users who know what they're doing, from doing what they want (IMHO)

JohnRusk commented 3 years ago

Glad to hear you have a workaround that works. I'll leave wider consideration of the points you raise to the current members of the AzCopy team.

zezha-msft commented 3 years ago

Hi @dhirschfeld, thanks for the feedbacks!

We have work planned to make the logs more concise and yet informational. Perhaps as part of that work item we can consider adding such a flag to pump the log into stdout. But it's tricky though, since we have regular progress updates that are supposed to go to stdout. We will evaluate the UX and make a decision.

TomaszOledzki commented 3 years ago

I would like to highlight something here.

@JohnRusk @zezha-msft you're explaining how AzCopy handle logs and plans files is made for the sake of users and to comfort users. AzCopy users are SysAdmins, IT Professionals and IT Stuff in general. Your users complains here, AzCopy behavior is bizarre and unexpected :).

dreamflasher commented 2 years ago

3 years later: Still an open bug. Please reopen the ticket. azcopy is filling even a huge hd with log spam.

ubyjvovk commented 2 years ago

👍 please reopen; current behavior is counter-intuitive in context of Unix-based systems

zezha-msft commented 2 years ago

@dreamflasher @debdude thanks for reaching out. Could you please share with us what would the ideal behavior for the log file?

micktion commented 2 years ago

@zezha-msft Hi, as per my comment from 18 months ago...

https://github.com/Azure/azure-storage-azcopy/issues/221#issuecomment-717724031

Disable diagnostic level logging by default, provide a switch to enable it. Have informational level logging go to standard output.

dreamflasher commented 2 years ago

Thanks a lot @zezha-msft - the core problem is that the log and plan files become huge and filling up too much disk space (I regularly have the issue that I can't even run a single azcopy task, sure it's several millions of files, but there's enough disk space for the files, just not for the log+plan files).

The obvious solution is to provide an option to turn off writing plan and log files.