bbfrederick / rapidtide

rapidtide - a suite of programs for doing time lag correlation analysis on fMRI data
Apache License 2.0
75 stars 14 forks source link

Switch from printing to logging messages #41

Open tsalo opened 4 years ago

tsalo commented 4 years ago

Is your feature request related to a problem? Please describe. Printing messages throughout the workflow doesn't let you control what level of information you see as a user. If you switch to using logging, then you can define the log level of each message, so devs can run with debugging-level messages, while users can run with just warnings and errors.

Describe the solution you'd like Throughout the library, swap print statements out with logging.

Describe alternatives you've considered None, really. Logging would be helpful, and the logging library is a built-in, so there are no additional dependencies.

Additional context None

EDIT: I'm happy to help with this once dev is merged in. Working on it before then would probably introduce a bunch of merge conflicts.

bbfrederick commented 4 years ago

I agree - this is a good idea but let's hold off until the merge.

tsalo commented 3 years ago

I think I have a rather clean way of handling the logging. I just have a couple of questions:

  1. How important is the extra info in the timing file, like the voxels in the following line? https://github.com/bbfrederick/rapidtide/blob/6e37daefeef7f9b817743aead74fc9fe71e5d4fd/rapidtide/workflows/rapidtide.py#L753
  2. Why do you use sys.exit() instead of raising an Exception (see below for an example)? If I switched to using exceptions, would that be a problem for the workflow? https://github.com/bbfrederick/rapidtide/blob/6e37daefeef7f9b817743aead74fc9fe71e5d4fd/rapidtide/workflows/rapidtide.py#L94
  3. EDIT: One more... Your current logging (print statements) includes a lot of styling like extra newlines for different sections. Would you prefer to have a text file with that formatting retained, but without the ability to trace back the source of the messages to individual functions, or a tab-delimited file where only core messages (i.e., empty lines, no asterisks, no indents) are retained, but they come with stuff like the time and the function from which the message was generated?
bbfrederick commented 3 years ago
  1. The extra info is kind of important - that's how I compare run speeds with the multiprocessor and multithreaded versions of the code - voxels per second completed. Helps me normalize across different types of input data.
  2. Because I didn't know what I was doing when I started writing rapidtide :-) Switching to exceptions would be better practice, and I'm all for converting. I've patched up a few specific routines, but hadn't really thought about a better, more consistent way of error handling.
  3. I'm not sure I have a strong opinion one way or another. I'm used to the formatting the way it is, but that doesn't mean it's the best way to do things. It is a bit busy. I do tend to use it for debugging when people have a problem, so I guess it should be as informative as possible if there's an error, but maybe a bit less verbose if there isn't, if that makes sense. One thing that would be useful would be to suppress all progress bars if you aren't writing to stdout - that tends to make a mess of the output files.
tsalo commented 3 years ago
  1. The extra info is kind of important - that's how I compare run speeds with the multiprocessor and multithreaded versions of the code - voxels per second completed. Helps me normalize across different types of input data.

Okay, I can incorporate it into the timing logger then.

  1. Because I didn't know what I was doing when I started writing rapidtide :-) Switching to exceptions would be better practice, and I'm all for converting. I've patched up a few specific routines, but hadn't really thought about a better, more consistent way of error handling.

Fair enough! I can use exceptions and logging.errors in my PR.

  1. I do tend to use it for debugging when people have a problem, so I guess it should be as informative as possible if there's an error, but maybe a bit less verbose if there isn't, if that makes sense.

I can stick with a freeform text file for now, and then we can revisit if you want to leverage more of the features logging provides.

I'll open a draft PR today and you can see what I'm thinking.

tsalo commented 3 years ago

One thing that I'm stuck on is the memprofile option. If memprofile is False, then memory information is logged to MemoryLGR. What is the expected behavior if memprofile is True?

bbfrederick commented 3 years ago

Ah - that's a little obscure. I have a wrapper function, addmemprofiling. If memory_profiler is install, then memprofile is True then that wraps functions in a call to profile(), a function from memory_profiler that gives detailed memory usage information (more detailed that what the tide_util function logjam provides). I have an unresolved issue with extremely large files on linux. It seems to never actually release memory when objects are deleted - I think the term is "high water mark allocation". On the mac, the VMEM happily goes up and down as I clear out arrays. On linux, it only ever goes up, and gets absurd (57GB when processing HCP resting state data). Adding all the profiling stuff was part of an (abandoned) effort to get to the bottom of that. I still intend to fix it someday. I don't think having it in there is harming anything for the most part - memory_profiler maintains its own file. I suppose we could change the logic - instead of doing one or the other, always log to to logmem, but if memory_profiler exists, also add the profiling wrapper, which will write to its own file (instead of NOT logging to logmem if memory_profiler is there).