copasi / cloud-copasi

cloud-copasi is a web service to manage COPASI simulation tasks in HPC facilities and compute clouds, such as condor, AWS, etc.
GNU General Public License v3.0
5 stars 1 forks source link

temporary (?) debug logging setup may cause issues #11

Closed bdklahn closed 2 years ago

bdklahn commented 2 years ago

I don't know if it caused any issues, but the following setup might be problematic:

https://github.com/copasi/cloud-copasi/blob/6776d73dd2d1413aa1f41c8469c29aa3e430c7b0/cloud_copasi/background_daemon/tools/pool_tools.py#L18-L28

First, the log and the check loggers are going to be one in the same, because __name__ is the same. Probably the one setting check is just creating another variable pointing to the same log logger.

Secondly, I'm not sure of the behavior when basicConfig() is explicitly called, because it might be implicitly called already (with default values, having the same format and datefmt) when log is created/assigned. I can't recall if it's when it finds an empty handler list, or when creating a logger, or . . . but there used to be one potential 'gotcha', in certain situations, I recall from the default initialization behavior, if you don't explicitly set a handler before logging. I guess, from the docs, it might not be an issue, because it looks like new handlers are only created if none are there, during the first call to output (e.g. log.debug().)

Anyway, the way to do what I think is intended there is to create another handler, which is of a file type of handler. And, I believe, handlers can have their own logging levels, if you want them different than the stdout logger.

So maybe first call basicConfig() to instantiate the default logger handler. Then create a handler to add to the handler to the list of handlers of log (setting a custom level on that handler). Also, it would be fine to have everything obey a global level. Check the docs . . .

hasanbaig commented 2 years ago

Thanks Brian for bringing it up. Indeed, we have been having some issues in logging.

I added basicConfig because, I don't know for what reason, it stopped logging the debug messages in the log file. So, I made a temporary fix while debugging one particular file, and then replicated in all of them. There are some other bugs I am working on. May be I will create issue for each of them (as I was doing in pvt repo) to keep track of all.

pmendes commented 2 years ago

Brian, I have been looking at this and thought the same. I removed one of the loggers, and set the other to a FileHandler. Still the error keeps happening. I managed to suppress the error by simply commenting out the statement: https://github.com/copasi/cloud-copasi/blob/6776d73dd2d1413aa1f41c8469c29aa3e430c7b0/cloud_copasi/background_daemon/tools/pool_tools.py#L33

Now, I can't figure out why this is causing an exception, particularly as it does not even have a "%" in the string (which is what is the error that appears in the backtrace)

By the way, the error still happens even when I changed the "check" logger by the original "log" logger: log.debug('Refreshing all EC2 pools')

bdklahn commented 2 years ago

Yeah, Pedro, I had doubts that was the issue. But I did want to make sure it was known that the second call to getLogger(__name__) just returns the exact same logger as the first call (which did create it, if/because it didn't exist). __name__ just returns the name of the current module (the python file it's in). Logger objects are globally available by name. getLogger() should return the main root logger, having the name 'root'. Using __name__ for the name is a recommended convention which makes it easier to know what code module the message is coming from, via the default formatted log output. I.e., that name will be stored in the 'name' variable below and shown when 'name' is part of the logger's formatting specification, as it is with in the default case:

> python
Python 3.8.12 (default, Aug 31 2021, 01:23:42) [GCC] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import logging
>>> logging.BASIC_FORMAT
'%(levelname)s:%(name)s:%(message)s'

I think Frank was thinking that check.debug . . . call isn't the one giving the supposed '%'. So I think he was wanting to find/create some trace where we see what's in a variable which might, inserted into the format string of such a log message call. I.e., maybe some other call is doing logger.debug("Here is my variable: %s" % myvar). Maybe the string in myvar has a '%' which is messing with the syntax of the format string syntax(?). Maybe during the formatting of one of the message strings, a quote found it's way to the right of that second '%', or something, and the logger formatter thought it was an (incomplete) inline format symbol.

The logging module (I believe for some backwards compatibility issues) still uses the old style format string for it's own internal output settings (the stuff you saw in format= in Hasan's logging config settings.). But our code, since it is using Python >3.6 (is 3.8), can use the newer f-string syntax (as well as the a_string_var.format() style, for some more functionality/control) e.g. logger.debug(f'Here is my_var: {my_var}') (Note the 'f' before the first quote.) v.s. logger.debug('Here is my_var: %s' % my_var) (old printf-style formatting)

I don't know if that will fix/solve anything, but maybe it could help with having one less source of '%'s flying around. :-)

pmendes commented 2 years ago

Brian, I think the problem is indeed in the logger configuration. It turns out that cloud-copasi was not logging to /var/log/messages, but because this is a daemon, when there is an exception it gets logged in there. I just looked and the same thing is happening in the httpd log (it has also many exceptions which happen in the main code). I think these are all related to logging and I think the best way forward is to change all the logging setup carefully in all the files.

I plan to have everything being logged onto ~/log/cloud-copasi.log and ~/log/cloud-copasi-daemon.log (ie in the home directory of the user running the services

bdklahn commented 2 years ago

Pedro, that seems like a good solution. And maybe it is just simple (and probably common practice) to just use Ed's already-present module level Logger objects, explicitly configuring/adding a file handler. Then it's fine to add/leave logger.debug statements in many places. Those can be quieted by setting the module or global level to INFO or WARN (the default). But, you might agree, that we might want the file handler to have DEBUG level, at least for now. I guess I tend to set log files to INFO, and use logger.info('. . .') for common routine things we like to note are happening in the program flow, but might not want other modules importing "this" module as a library to necessarily see (get a lot of noise, other than WARN). Maybe logger.info(f'now writing to file: {path}'), etc. Then I reserve logger.debug() for maybe seeing some of the guts of certain objects, etc. And sometimes I'll use pprint pformat to make python objects like dictionaries or lists a little easier to read. e.g.

from pprint import pformat
. . .
some_dict = dict()
. . .
logger.debug(f'some_dict: {pformat(some_dict)}')

Also note this f-string shortcut (included '='): logger.info(f'{my_var=}') . . . gives the same as . . . logger.info(f'my_var={my_var}')

pmendes commented 2 years ago

The error is deeper that he issue of the two concurrent logs. I removed the check log that Hasan had created, so that everything goes into "log" but still any message that we try to log causes the exception. I think this must be in the definition of the log. I suspect this is a python 2->3 issue. Most likely there are changes in the way the log has to be defined.

bdklahn commented 2 years ago

Hmmm . . . maybe so. Well, I guess I/we might look at places where logging stuff is imported and used, and see if it is consistent with the Logging docs for Python 3.8 (The lower "Advanced" --> "Configuring Logging" section, might be the more, hopefully, helpful part.)

hasanbaig commented 2 years ago

Brian, it is first initialized in Django settings.py file. The settings.py file running on the server has the following logging configuration:

#Log levels: critical, error, warning, info, debug
LOGGING = {
    'version': 1,
    'disable_existing_loggers': True,
    'formatters': {
        'verbose': {
            #'format': '%(levelname)s %(asctime)s %(module)s %(process)d %(thread)d %(message)s'
            #the following format is updated by HB
            'format': '%(levelname)s %(asctime) %(message)s', 
            'datefmt': '%m/%d/%y %I:%M:%s %p'
        },
        'simple': {
            'format': '%(levelname)s %(message)s'
        },
    },
    'handlers': {
        'null': {
            'level': 'DEBUG',
            'class': 'logging.NullHandler',
        },
        'console':{
            'level': 'DEBUG',
            'class': 'logging.StreamHandler',
            'formatter': 'simple'
        },
        'file' : {
            'level': 'DEBUG',
            'class': 'logging.handlers.RotatingFileHandler',
            'filename': os.path.join(LOG_DIR, 'cloud-copasi.log'),
            'maxBytes': 1024*1024*5, # 5 MB
            'backupCount': 10,
            'formatter':'verbose',
        },
    },
    'loggers': {
        'django': {
            'handlers': ['file'],
            'propagate': True,
            'level': 'INFO',
        },
        'cloud_copasi': {
            'handlers': ['console', 'file'],
            'level': 'DEBUG',            
        }
    }
}

We also need to look at Django Logging.

bdklahn commented 2 years ago

Ah. Right. Maybe old Django settings.py doesn't have some of the right syntax for the Python 3x config.

I guess we can see what the/a newer Django settings.py example looks like?

I guess that's what you might have already done, in the process of updating that format line?

hasanbaig commented 2 years ago

I am not certainly sure, but yes we should and I will look at it.

bdklahn commented 2 years ago

I just looked at one of the examples in that Django Logging ref you shared @hasanbaig (thanks).

One interesting thing I see in one example is . . .

'format': '{levelname} {asctime} {module} {process:d} {thread:d} {message}',

That doesn't seem to have the '%(levelname)s''-like syntax.

hasanbaig commented 2 years ago

Yes, I have already checked that out, but the description mentioned below the example about "formatter" refers to the same python logging page:

I am quoting it below: The format string is a normal Python formatting string describing the details that are to be output on each logging line. The full list of detail that can be output can be found in [Formatter Objects](https://docs.python.org/3/library/logging.html#formatter-objects).

bdklahn commented 2 years ago

Hmm. Yes. But I wonder if, in the new Django, the default formatting style is set to '{' and not '%', somewhere. I.e., maybe the style in this class constructor is overridden by Django? class logging.Formatter(fmt=None, datefmt=None, style='%', validate=True, *, defaults=None But maybe that's what you are saying, that you checked (read further?) and it is not. But maybe 'version': 1 is supposed to handle using that "old" style?

I wonder what might happen if we completely leave out the format specification in settings.py, and see if the default format used works.

I might guess that Ed only explicitly set that to get extra information like 'process' and 'thread' for the verbose case (and make the simple case quieter).

hasanbaig commented 2 years ago

May be! you are right. I have not looked deeper into it. I will try that out locally on my system.

bdklahn commented 2 years ago

Yeah. I dunno. From this . . . https://betterstack.com/community/guides/logging/python/how-to-start-logging-with-django . . . it certainly seems like the default is still "%", and you have to explicitly set style: '{', to use that. I guess we can try temporarily removing the format (or replacing with the new format), and see. Puzzling.

Maybe we can just "upgrade" to the '{' style, either way.

I have come across StackOverflow issues where a variable name with a '%' in it caused some problems. So maybe using the '{' style would work around any of those potential issues(?).

pmendes commented 2 years ago

I don't think the problem is down to formatting, but we can certainly change to the new format, though I would do that only after ruling the setup. I am going to work on this later today, but I already see that the settings.py file contains two loggers, one called 'django' and another called 'cloud-copasi'; the latter has two handlers, one of them to the console, which I think may be a problem (neither the httpd or the daemon have consoles associated...)

My plan is to examine the entire logging aspect on the cloud-copasi-daemon first and fix the problem there. When that is sorted out we can then move to the httpd/wsgi part.

I will put all logs in files inside the folder ~/log, one called cloud-copasi.log (for httpd/wsgi part) and the other cloud-copasi-daemon.log (for the daemon).

fbergmann commented 2 years ago
   'format': '%(levelname)s %(asctime) %(message)s', 

this is the issue .. there is a missing s after %(asctime) it has to be %(asctime)s

hasanbaig commented 2 years ago

Thanks @fbergmann . It removed the error messages.

@bdklahn: Replacing 'format' and 'datefmt' with:

'format': '{levelname} {asctime} {module} {process:d} {thread:d} {message}',
'style': '{',

also worked. There seems to be no error now. So, as Frank mentioned, we can add the missing 's' also to make it work.

bdklahn commented 2 years ago

Make sense, @fbergmann (I guess it pays to look more carefully :-) ). Then the '%' (or the next one) is interpreted as an ambiguous '%' string, showing up in the error log output, since the parser finds no matching format specifier ('s' == string, 'd' == double, I think, etc.)

If the parser doesn't even expect that style (for itself, but probably not an issue for 'message' strings passed in, as they might be parsed before assignment) by setting and using style: '{', then the format with the erroneous syntax was not there/used.

pmendes commented 2 years ago

How many pairs of eyes are needed until we spot the lone % ? I had been looking for this error for hours last night...

pmendes commented 2 years ago

We've made the changes and now we have two loggers, one for the daemon and another for the web code. No more errors appear in systems logs.