barronh / finn2cmaq

Fire emission pre-processor for CMAQ.
8 stars 2 forks source link

How to process a single day? #3

Closed coderodyhpc closed 1 year ago

coderodyhpc commented 1 year ago

Hello, I'm trying to run the FINN2CMAQ components to generate a CMAQ input file for last week following the example. [NOTE: I don't have a Jupyter or similar notebook so it's just the Linux console]. Running get_nrt.py 2023-06-07 downloaded ~FINN/finn2cmaq/scripts/www.acom.ucar.edu/acresp/MODELING/finn_emis_txt/GLOB_GEOSchem_2023158.txt.gz. This seems to have worked fine. The problem is the syntax of the next step (txt2daily.py) and more specifically the arguments. I have 2 questions: -What is GDNAM and which kind of input is FINN2CMAQ expecting? -Because I'm trying to process a single day, which should be the input for YEAR? Is it simply 2023? Another (and more general) question is if FINN2CMAQ takes any projection in GRIDDESC as I'm using Lambert conformal conic. Thanks.

barronh commented 1 year ago

Did you figure it out?

coderodyhpc commented 1 year ago

I figured that out, but the new problem is that pseudonetcdf is failing with an AssertionError because it doesn't like the first line of GRIDDESC. I even edited the file to make sure that the first line reads ' ' but was still getting the error. Other than using a different GRIDDESC file, not really sure what else to try.

barronh commented 1 year ago

Can you post the error?

coderodyhpc commented 1 year ago

Hi @barronh, The error reads:

Traceback (most recent call last):
  File "/home/odyhpc/FINN/finn2cmaq/scripts/./txt2daily.py", line 205, in <module>
    gf = pnc.pncopen(args.GRIDDESC, format='griddesc', GDNAM=args.GDNAM)
  File "/home/odyhpc/.local/lib/python3.9/site-packages/PseudoNetCDF/_getreader.py", line 153, in pncopen
    outfile = reader(*args, **kwds)
  File "/home/odyhpc/.local/lib/python3.9/site-packages/PseudoNetCDF/cmaqfiles/_griddesc.py", line 111, in __init__
    assert(gdlines[-1] == "' '")
AssertionError

Also, I created a very simple MRE:

import PseudoNetCDF as pnc
test = pnc.pncopen('~/FINN/GRIDDESC', format = 'griddesc', GDNAM='12US1')

but it generates a very similar error:

Traceback (most recent call last):
  File "/home/odyhpc/FINN/finn2cmaq/scripts/tonto.py", line 2, in <module>
    test = pnc.pncopen('~/FINN/GRIDDESC', format = 'griddesc', GDNAM='12US1')
  File "/home/odyhpc/.local/lib/python3.9/site-packages/PseudoNetCDF/_getreader.py", line 153, in pncopen
    outfile = reader(*args, **kwds)
  File "/home/odyhpc/.local/lib/python3.9/site-packages/PseudoNetCDF/cmaqfiles/_griddesc.py", line 110, in __init__
    assert(gdlines[0] == "' '")
AssertionError

Thanks.

barronh commented 1 year ago

Python is not going to automatically expand your user directory. So, it doesn’t know what ‘~’ means. From the command line, the ‘~’ is appropriately expanded. That’s why the errors are different.

The first error says that the last line (not the first) is not ‘ ‘.

coderodyhpc commented 1 year ago

It was finally able to pick up the GRIDDESC file, but I had to edit it a bit so that both first and last lines are ' '. Also, I needed to get rid of any comments. After it, I modified txt2daily slightly to accommodate a single day but it's finally generating netCDF files. I still haven't had time to test them with CMAQ.

coderodyhpc commented 1 year ago

Hi @barronh, Sorry to reopen this subject again (I could also open a new thread if necessary) but a couple of hiccups have crept up. 1- Even though I wrote that everything was working OK, and upon further examination, the 3D file is not what CMAQ is expecting. The 2D output is fine but the 3D output only has data for one hour (00) with no further data for any other hour. Tracing back reveals that the variable times only has one item [datetime.datetime(2023, 6, 7, 0, 0, tzinfo=datetime.timezone.utc)]. However, I wonder if it should have another 24 items. Most everything else seems OK (e.g tfactorarr and other variables). However, I noticed that spcf has (in addition to many other data): TSTEP = 240000 and I'm wondering if this could be the root of the problem, I am not really familiar with pseudonetcdf but this is what stood out. [Note: I hope that this explanation is clear enough, but I can provide further details or variable outputs] 2 - The second is about speciation. I'm looking into modifying aux/gc12_to_cb6r3_ae7_v2.txt to accommodate the cb6r5_ae7_a9 mechanism. However, I cannot find the correlations used by the code (checked the CMAQ website including https://github.com/USEPA/CMAQ/blob/main/CCTM/src/MECHS/mechanism_information/cb6r5_ae7_aq/mech_cb6r5_ae7_aq.md). Maybe nothing needs to be modified but could you point to any link describing these equations? Thanks.

barronh commented 1 year ago

My guess is that it didn't complete at some point. After that, it may be caching an incomplete result. For example, if you run on Google Colab free version, you are likely to run out of memory at some point. It can be hard to notice that it crashed.

Just to check, I ran on google colab for the same day using 12US1. It crashed because of memory. I ran again for 36US3 and it worked fine. In short, delete the results and try running again. daily2hourly3d.py can also be adapted to use less memory, but you can also run on a machine with more memory.

image

In addition, be aware that the default layer fractions are configured to run for the 44 layers typically used in the hemispheric platform. Consider replacing the layer allocations with a 35 layer version (or whatever your WRF setup is). That will use less memory to start with.

When you say correlation, I think you are talking about the speciation which I sometimes call species mapping. The mapping between gas-phase species is a best estimate that I and others have developed over time (for emissions and boundary conditions). Most of the aerosol translations for are either fairly direct or came from the Speciate database. If you use the "Browser" you can lookup the the 91102 profile that was used to speciate aerosols. If you choose to adapt, please post your adaptations in a new issue.

coderodyhpc commented 1 year ago

I'm using my own GRIDDESC file (generated with the latest version of MCIP) with a 100 x 100 grid and the system has 8GB. I'm pretty sure it's not a memory problem. The more I take a look at the outputs, the more I suspect of that TSTEP = 240000 assigned in spcf, but cannot figure out how to decrease it. I tried assigning TSTEP equal to 1000 but that didn't fix the issue. It must be something else so I'll try to take a fresh look later this evening.

P.S. I used the word correlation as a general expression but species mapping is definitely more precise.

barronh commented 1 year ago

The spcf.TSTEP is 240000, which appropriate because it is derived from the daily resolution input file (inf). The inf is daily input file and TSTEP is HHMMSS, so 240000 is appropriate there. spcf is the result of inf.copy().eval(evalexpr). That means that it performs algebraic transformations, and the result is still daily -- so 240000 is still appropriate. The outf is a copy of spcf, but has the TSTEP property modified from 240000 to 10000 and the TSTEP dimension extended from 1 to 25. The value in outf is modulated by the tfactor, which is the factor of temporal variation. Thus, the final result is outf.TSTEP is already 10000.

You are, of course, welcome to modify the spcf.TSTEP from 240000 to anything you want. You modify it by assigning it. 10000 is the integer equivalent of 010000. So, the code to modify it is below:

spcf.TSTEP = 10000

To be clear, I do not think that is the problem.

It would be great to see a log of what you have done. Delete all the made files, Re-run all the commands, so that nothing is cached. Copy the commands from your terminal along with their output, and any details you think would allow someone to replicate the issue (e.g., post your GRIDDESC).

Since you are only running 100x100 grid cells, it should be able to run in Google Colab. Testing it there is another way of trying to resolve the issue. The commands below can each be placed in a cell in Google Colab and run. If you run this exactly, you will get the result that I showed, which is a 25 step file with real data in it at all time steps.

!wget -N https://github.com/barronh/finn2cmaq/archive/refs/heads/master.zip
!unzip master.zip
cd /content/finn2cmaq-master/
!pip install -q pseudonetcdf pyproj pycno
!./scripts/get_nrt.py 2023-06-07
!./scripts/txt2daily.py aux/GRIDDESC 36US3 2023 www.acom.ucar.edu/acresp/MODELING/finn_emis_txt/GLOB_GEOSchem_2023158.txt.gz ./GLOB_GEOSchem_2023158.36US3.nc
!./scripts/daily2hourly3d.py ./GLOB_GEOSchem_2023158.36US3.nc ./GLOB_GEOSchem_2023158.36US3.3D.nc
import xarray as xr
f = xr.open_dataset('./GLOB_GEOSchem_2023158.36US3.3D.nc', cache=False)
f

After you run that and it succeeds, upload your GRIDDESC and modify the commands to point to your GRIDDESC using your GDNAM. If it fails there, copy take a screenshot of the errors or output. Below, I assume you uploaded GRIDDESC to the folder that Google Colab starts in (/content) and I assume your GDNAM is CUSTOM.

!./scripts/txt2daily.py /content/GRIDDESC CUSTOM 2023 www.acom.ucar.edu/acresp/MODELING/finn_emis_txt/GLOB_GEOSchem_2023158.txt.gz ./GLOB_GEOSchem_2023158.CUSTOM.nc
!./scripts/daily2hourly3d.py ./GLOB_GEOSchem_2023158.CUSTOM.nc ./GLOB_GEOSchem_2023158.CUSTOM.3D.nc
import xarray as xr
f = xr.open_dataset('./GLOB_GEOSchem_2023158.CUSTOM.3D.nc', cache=False)
f

If it succeeds, you'll have to figure out why it fails on your system.

coderodyhpc commented 1 year ago

That is what I tried (spcf.TSTEP = 10000 and as you anticipated, it didn't fix the problem. I'm on my way out of the door but will try your suggestions later this evening. [P.S. I don't know what is Google Colab but have tried in a Chromebook with a penguin OS and in an Ubuntu system, and both behaved the same fashion]

barronh commented 1 year ago

https://colab.research.google.com

coderodyhpc commented 1 year ago

I have continued testing over the weekend as other commitments have permitted. At least I have a much better grasp of how the app works as some of my preliminary suppositions were a bit off. Anyway, I was able to process a single day using data from a whole year, but processing a single isolated day is still not working properly. The weird thing is that the tensor tied to inv is al zeroes. Not sure why, but I hope to figure it out tomorrow morning and that everything else will then fall into place.

barronh commented 1 year ago

Why don’t you share the GRIDDESC you’re doing. I’ll try run the commands I put for Colab. If they work there and make a 25 hour file, we’ll assume it is an issue on your device.

coderodyhpc commented 1 year ago

Sorry that I forgot to post the GRIDDESC file. It's a pretty simple one:

' '
'CoordName'
  2        42.000        54.000       -77.000       -77.000        40.000
' '
'waw'
'CoordName'   -600000.000    287500.000     12000.000     12000.000  100  100    1
' '

Github doesn't allow to upload a file with no extension, but the equivalent text file is also attached. Anyway, I had no problem processing 07/01/2016 from the whole 2016 year data.

GRIDDESC.txt

coderodyhpc commented 1 year ago

It seems that I finally figured out how to run the app properly. Just need to double-check that the outputs are sound, and they work with CMAQ.