Open pmartien opened 2 years ago
We have finished the test of the multi-day InMAP with 1-km CMAQ, and have sent the scripts.
Thanks, @bkoo-git, @yuzhou-wang. Any issues to report? What are next steps?
I've finished testing the scripts prepared by @yuzhou-wang on our cluster machine. I believe the next step will be building InMAP using the preprocessed WRF/CMAQ data. @yuzhou-wang, any guidance?
Runtime for preprocessing the 1-km WRF and CMAQ data for InMAP for the whole year of 2018:
Thanks @bkoo-git for the status update and for the questions about next steps! @yuzhou-wang, @bujinb: can you provide a status update? Do you see any issues with what @bkoo-git provided? I'm trying to encourage more discussion and updates via GitHub so we can facilitate quick turn around on simple blockers to our collective progress. If this doesn't work I will call for more frequent project Zoom meetings, which I think will be less efficient & productive. :-)
Hi InMAP-SFAB team, Any progress to report? Updates for the group? Thanks!
I'm still working on the testing of the 1km data, and will provide feedbacks by the end of this week.
Great. Thank you @yuzhou-wang ! I look forward to your feedback!
I have run inmap and isrm on the google cloud. Currently working on running multiple inmap runs in parallel on compute engine, but facing issues
@bujinb Could you please clarify about the issues? Is there anything wrong with the InMAP data file I processed or you are having issues with running InMAP on the cloud?
@bkoo-git I am primarily working on running inmap on cloud and having issues with running multiple inmap runs in parallel in the cloud. @yuzhou-wang is working with the cmaq outputs
Thanks, @yuzhou-wang, @bujinb! So it sounds like the processed files we handed off to you are okay? But that setting up multiple runs on the Google cloud processors is an issue. Is it an issue specific to InMAP or just running any process on multiple processors is an issue? Thanks for posting updates on GitHub!
@pmartien google engineer thinks it is an inmap issue, but Chris has used kubernetes to run inmap in paralel before so it might not be inmap issue. We are trying to have regular meetings with Chris for help. We'll try to update on github as much as we can. Thanks!
@bujinb: Got it! Let us know if there's any way we can be helpful.
I tested the new InMAP several time, but ran into a same problem: it generated infinite concentrations using the emissions in San Francisco. I'm still trying to find out the reason. I will post updates when I find our the reason or solved the problem.
Thanks, @yuzhou-wang ! Let us know if there is any indication that the files we provided are causing/contributing to this problem. Are you only seeing the problem when submitting multiple InMAP runs? Or does it also occur with a single run? Thanks again.
@pmartien I tried both single and multiple InMAP runs, and used both one-day and whole-year InMAP data, all the tests generate infinite numbers. I looked into the InMAP data and find that there should be some problems with the calculation of dry deposition. Futher tracking to the wrfcmaq data, there are missing values in three wrfcmaq variables (rain water mixing ratio, cloud water mixing ratio, cloud fraction). I'm not sure whether the problem is caused by the wrf data itself, or by my calculation (getting wrfcmaq data from wrf and cmaq). I'll look in to the wrf data and try to find our the reason of the problem in the following days.
I have figured out the problem: there is mismatch of the wrf layers and wrfcmaq layers. The wrfcmaq verticle layers should start from 0 (ground level), but it started from -1 due to a small error in the python preprocess code. I revised the python code and generated a new one-day inmap, and it runs correctly. So I guess we need to redo the whole year inmap preprocess using the revised code. I'll make more tests to make sure that the revised code generate correct results. I'll send @bkoo-git the updated inmap preprocess code and a detailed guide to run the new inmap this week.
@yuzhou-wang Thanks for fixing the error! I'll re-process the wrfcmaq data once I receive the updated code.
Thanks @yuzhou-wang, @bujinb for isolating this problem! And for keeping us updated on github; super helpful!
We are still working on running multiple inmap in parallel on google cloud. Chris's Kubernetes cluster can run 1250 inmap at the same time, we are hoping we could do the same or better. I heard from Yuzhou that your cluster is fast; I was wondering if we can run inmap in parallel on your cluster @bkoo-git Can we schedule a quick meeting? Thanks Bujin
Status update:
Great work, all. Thanks for the updates @bkoo-git!
@bkoo-git @bujinb I'd also like to join the meeting about the running InMAP in parallel. Can you send me the link? Thanks!
@bkoo-git ,@yuzhou-wang, Jeff and I had our meeting on running inmap on your local cluster. Seems like running inmap on the cloud will be the faster way to generate the new ISRM as we can potentially run a thousand inmap run in parallel once we learn how to utilize kubernetes. We have sent instructions of running inmap on a local computer (in the google drive). @bkoo-git please update us when you try running it on your cluster.
Status update:
Thanks for the status updates, @bujinb and @bkoo-git. This sounds like good progress.
I've tested the new InMAP using the year-2016 egu emissions in San Francisco. The concentration estimations at 1km resolution are higher (2 to 3 times higher) than the estimation from the national InMAP, but looks still reasonable. I'll make more test using other emission files.
@bkoo-git In case you need help running Inmap on your local cluster @yuzhou-wang and I are available. Update on running inmap on cloud: We are still trying to troubleshoot our attempts at running inmap on kubernetes engine.
Thanks, @bujinb! I did test the 2005 NEI test case (from the InMAP release page) on the District cluster, and the results look reasonable. However, I believe a better test would be to reproduce the results of the Bay Area test case @yuzhou-wang did using the InMAP data file created from the 2018 CMAQ/WRF data. I've asked @yuzhou-wang for the input files she used for her test, and received the files today. I will try to replicate her test case on our cluster this week and report back to you guys~
Thanks again all for the status update. Much appreciated!
A quick update: I successfully ran @yuzhou-wang's SF test case on the District cluster and verified that my results and hers are identical. She said the test run took 1.5 hours on her lab computer. It took 44 minutes on the District machine (soma). As I believe the InMAP code is not threaded, I think the runtime difference simply reflects the clock speed difference between the processors used. Also, note that the test case doesn't include the full set of emissions used in our 2018 base case CMAQ simulation.
Great news, @bkoo-git. What should our next steps be? Should we meet to discuss?
I think now might be a good time for another meeting to get everyone on the same page and discuss the next step. I wonder if the InMAP results (if all emissions are included) naturally match our annual CMAQ results since we built the InMAP baseline chemistry input data using the full 2018 CMAQ outputs. I'd like hear from the InMAP developers on this. If we still need to evaluate how well InMAP replicates the annual CMAQ results, we'd need to develop InMAP emission inputs that are consistent with the 2018 CMAQ emissions inputs, re-run InMAP, and compare the InMAP results with our CMAQ results.
I think this comparison it's important. Although the new InMAP was built on the CMAQ, the annual predictions can still be slightly different since InMAP is linear.
We can discuss the emission inputs needed by InMAP. @bkoo-git Do you have the emission inputs that are align to the CMAQ grids (1km or 4km)?
We have discussed about the emissions input formats in Issue #2 and determined that the SMOKE-formatted files (such as ORL or FF10) would be the easiest way if we want to retain the source info (e.g., SCC). @yuzhou-wang, do you have a sample test case that uses SMOKE-formatted emissions input files?
I've made a comparison between the national InMAP and the new InMAP, using all the NEI 2016 all point emissions in the Bay Area. I've attached the comparison slides. I compared the results at both 1km and 10km spatial resolutions. It seems that the mean value of Total PM2.5 predictions from the new InMAP for the whole domain is around 2-3 times of the national InMAP predictions. The biggest difference is in SO2 pollutant, for which the new InMAP has much higher concentration predictions than the national InMAP. I've also looked at the total SO2 emissions in California, and find that the SO2 emissions dropped an order of magnitude from 2005 to 2018 (160 ton/year to 20 ton/year). The great changes of SO2 emissions may cause the sensitivity changes of SO2 to the PSO4.
The good thing is that from the 1km resolution comparisons, the predictions from the new InMAP seems more precise. It also seems to capture the emission sources well.
We plan to more comparisons of the new InMAP to CMAQ, and new InMAP to monitoring concentrations to see how well the new InMAP perform.
comparison_inmap.pptx
@bkoo-git Could you send me a sample of SMOKE-formatted emissions input files? I'd like to make some test runs using that format. I don't have a sample SMOKE-formatted emissions input handy.
Thanks @yuzhou-wang for sharing your comparison results. @stephenreid65 can provide you with sample SMOKE-formatted emissions input files. I have a question: Can you use different emissions input formats in a single run? For example, can you list a SMOKE-formatted emissions input for a source category and a shapefile for another category in the same TOML?
@bkoo-git I'm not sure about it. I'll make some test runs including both shapefile and SMOKE-formatted emissions. I guess the default InMAP configuration only take shapefile. We may need to make some preprocess to convert the SMOKE-formatted emissions to shapefile.
I was asking because not all emissions are generated by SMOKE. Sea spray emissions are internally generated by CMAQ at runtime: they can be made available via diagnostic outputs in a netCDF format, which could be converted to a shapefile, but formatting them into a SMOKE inventory file wouldn't be desirable.
@yuzhou-wang If we have to convert the SMOKE-formatted emissions to shapefiles, wouldn't we lose source info in the process? Then, what's the purpose of using SMOKE-formatted emissions? I notice that your test case emission inputs don't retain source info like SCC. What's the reason why we want to keep source info like SCC in the emissions input?
Thanks for sharing the comparison slide deck, @yuzhou-wang. That's very interesting. @bkoo-git, are we seeing high PSO4 levels in CMAQ runs?
Annual average PSO4 predicted by CMAQ can be high near high SO2-emitting sources, but the max was ~10 μg/m3. Peak PSO4 predicted by InMAP appears to be much higher than what CMAQ predicted even though the InMAP run includes point source emissions only.
@yuzhou-wang, I can provide SMOKE-ready emissions inputs, but they would basically be CSV files with annual emissions by county or facility. I think you would need something gridded, so would our spatial surrogates also be required? We don't have emissions in shapefile format right now.
@bkoo-git @stephenreid65 I guess that since the comparison is mostly to make sure that the new InMAP provide the reasonable prediction. So we may not need the emissions with detailed source info. I think we can use a combined emission file if it's available. Or if you have CMAQ prediction from a single source, I can also run the new InMAP using the single emission file. Do you have any suggestions on that?
We have discovered that VOC mappings in the wrfcmaq2inmap preprocessor wasn't updated for the SAPRC07 chemical mechanism which was used in our Bay Area CMAQ modeling, thus many VOC species were dropped from the process. So, we need to re-do the preprocessing. Since we are running a new 2018 base case CMAQ simulation at the moment, I propose preparing the InMAP input data using the new simulation outputs. The new simulation will also generate additional diagnostic outputs for sea spray emissions, which can be used later for evaluating InMAP. Meanwhile, I will work with @yuzhou-wang to fix the VOC mappings in the preprocessor. Let me know if any comments/suggestions/questions.
We have successfully built the kubernetes needed for running inmap on the cloud in parallel to make a new ISRM, but still in the process of testing the command. Meanwhile, I have run the new inmap on several locations and made example test results. Please provide suggestions. example test results inmap.pptx
Hi @bujinb, @yuzhou-wang, and all. Thanks for the update and for sharing these test runs. Following up on earlier comments on this issue, I think it may be a good time to schedule a meeting to discuss next steps. I'll follow up with an email with some suggested dates.
We have successfully ran and made small (16 grid cells) isrm for testing purposes on google cloud. Now we are testing bigger runs with more grid cells to get the idea of how long and how much money will the process take. Before we have the meeting in 2 weeks, do you have any suggestions for the example test results I posted above? We will shift to 2020 census data soon. Thanks Bujin
Steps to Close