USEPA / EPA_MOVES_Model

Estimating emissions for mobile sources
Other
78 stars 20 forks source link

Need help with splitting the Runspec based on source type #76

Closed HansalShah007 closed 2 weeks ago

HansalShah007 commented 1 month ago

I was reading the document that mentions strategies for making a MOVES run faster: https://github.com/USEPA/EPA_MOVES_Model/blob/master/docs/TipsForFasterMOVESRuns.md

In here one of the strategies talk about splitting the Runspecs by source type. How can we do this? I am attaching a sample runspec file for reference. How do we split it on specific source types?

@danielbizercox can you help me out here? Thanks

<runspec version="MOVES4.0.1">
    <description><![CDATA[]]></description>
    <models>
        <model value="ONROAD"/>
    </models>
    <modelscale value="Rates"/>
    <modeldomain value="SINGLE"/>
    <geographicselections>
        <geographicselection type="COUNTY" key="34005" description="Burlington County, NJ (34005)"/>
    </geographicselections>
    <timespan>
        <year key="2025"/>
        <month id="1"/>
        <month id="2"/>
        <month id="3"/>
        <month id="4"/>
        <month id="5"/>
        <month id="6"/>
        <month id="7"/>
        <month id="8"/>
        <month id="9"/>
        <month id="10"/>
        <month id="11"/>
        <month id="12"/>
        <day id="2"/>
        <day id="5"/>
        <beginhour id="1"/>
        <endhour id="24"/>
        <aggregateBy key="Hour"/>
    </timespan>
    <onroadvehicleselections>
        <onroadvehicleselection fueltypeid="3" fueltypedesc="Compressed Natural Gas (CNG)" sourcetypeid="62" sourcetypename="Combination Long-haul Truck"/>
        <onroadvehicleselection fueltypeid="2" fueltypedesc="Diesel Fuel" sourcetypeid="62" sourcetypename="Combination Long-haul Truck"/>
        <onroadvehicleselection fueltypeid="9" fueltypedesc="Electricity" sourcetypeid="62" sourcetypename="Combination Long-haul Truck"/>
        <onroadvehicleselection fueltypeid="3" fueltypedesc="Compressed Natural Gas (CNG)" sourcetypeid="61" sourcetypename="Combination Short-haul Truck"/>
        <onroadvehicleselection fueltypeid="2" fueltypedesc="Diesel Fuel" sourcetypeid="61" sourcetypename="Combination Short-haul Truck"/>
        <onroadvehicleselection fueltypeid="9" fueltypedesc="Electricity" sourcetypeid="61" sourcetypename="Combination Short-haul Truck"/>
        <onroadvehicleselection fueltypeid="1" fueltypedesc="Gasoline" sourcetypeid="61" sourcetypename="Combination Short-haul Truck"/>
        <onroadvehicleselection fueltypeid="2" fueltypedesc="Diesel Fuel" sourcetypeid="32" sourcetypename="Light Commercial Truck"/>
        <onroadvehicleselection fueltypeid="9" fueltypedesc="Electricity" sourcetypeid="32" sourcetypename="Light Commercial Truck"/>
        <onroadvehicleselection fueltypeid="5" fueltypedesc="Ethanol (E-85)" sourcetypeid="32" sourcetypename="Light Commercial Truck"/>
        <onroadvehicleselection fueltypeid="1" fueltypedesc="Gasoline" sourcetypeid="32" sourcetypename="Light Commercial Truck"/>
        <onroadvehicleselection fueltypeid="3" fueltypedesc="Compressed Natural Gas (CNG)" sourcetypeid="54" sourcetypename="Motor Home"/>
        <onroadvehicleselection fueltypeid="2" fueltypedesc="Diesel Fuel" sourcetypeid="54" sourcetypename="Motor Home"/>
        <onroadvehicleselection fueltypeid="9" fueltypedesc="Electricity" sourcetypeid="54" sourcetypename="Motor Home"/>
        <onroadvehicleselection fueltypeid="1" fueltypedesc="Gasoline" sourcetypeid="54" sourcetypename="Motor Home"/>
        <onroadvehicleselection fueltypeid="1" fueltypedesc="Gasoline" sourcetypeid="11" sourcetypename="Motorcycle"/>
        <onroadvehicleselection fueltypeid="3" fueltypedesc="Compressed Natural Gas (CNG)" sourcetypeid="41" sourcetypename="Other Buses"/>
        <onroadvehicleselection fueltypeid="2" fueltypedesc="Diesel Fuel" sourcetypeid="41" sourcetypename="Other Buses"/>
        <onroadvehicleselection fueltypeid="9" fueltypedesc="Electricity" sourcetypeid="41" sourcetypename="Other Buses"/>
        <onroadvehicleselection fueltypeid="1" fueltypedesc="Gasoline" sourcetypeid="41" sourcetypename="Other Buses"/>
        <onroadvehicleselection fueltypeid="2" fueltypedesc="Diesel Fuel" sourcetypeid="21" sourcetypename="Passenger Car"/>
        <onroadvehicleselection fueltypeid="9" fueltypedesc="Electricity" sourcetypeid="21" sourcetypename="Passenger Car"/>
        <onroadvehicleselection fueltypeid="5" fueltypedesc="Ethanol (E-85)" sourcetypeid="21" sourcetypename="Passenger Car"/>
        <onroadvehicleselection fueltypeid="1" fueltypedesc="Gasoline" sourcetypeid="21" sourcetypename="Passenger Car"/>
        <onroadvehicleselection fueltypeid="2" fueltypedesc="Diesel Fuel" sourcetypeid="31" sourcetypename="Passenger Truck"/>
        <onroadvehicleselection fueltypeid="9" fueltypedesc="Electricity" sourcetypeid="31" sourcetypename="Passenger Truck"/>
        <onroadvehicleselection fueltypeid="5" fueltypedesc="Ethanol (E-85)" sourcetypeid="31" sourcetypename="Passenger Truck"/>
        <onroadvehicleselection fueltypeid="1" fueltypedesc="Gasoline" sourcetypeid="31" sourcetypename="Passenger Truck"/>
        <onroadvehicleselection fueltypeid="3" fueltypedesc="Compressed Natural Gas (CNG)" sourcetypeid="51" sourcetypename="Refuse Truck"/>
        <onroadvehicleselection fueltypeid="2" fueltypedesc="Diesel Fuel" sourcetypeid="51" sourcetypename="Refuse Truck"/>
        <onroadvehicleselection fueltypeid="9" fueltypedesc="Electricity" sourcetypeid="51" sourcetypename="Refuse Truck"/>
        <onroadvehicleselection fueltypeid="1" fueltypedesc="Gasoline" sourcetypeid="51" sourcetypename="Refuse Truck"/>
        <onroadvehicleselection fueltypeid="3" fueltypedesc="Compressed Natural Gas (CNG)" sourcetypeid="43" sourcetypename="School Bus"/>
        <onroadvehicleselection fueltypeid="2" fueltypedesc="Diesel Fuel" sourcetypeid="43" sourcetypename="School Bus"/>
        <onroadvehicleselection fueltypeid="9" fueltypedesc="Electricity" sourcetypeid="43" sourcetypename="School Bus"/>
        <onroadvehicleselection fueltypeid="1" fueltypedesc="Gasoline" sourcetypeid="43" sourcetypename="School Bus"/>
        <onroadvehicleselection fueltypeid="3" fueltypedesc="Compressed Natural Gas (CNG)" sourcetypeid="53" sourcetypename="Single Unit Long-haul Truck"/>
        <onroadvehicleselection fueltypeid="2" fueltypedesc="Diesel Fuel" sourcetypeid="53" sourcetypename="Single Unit Long-haul Truck"/>
        <onroadvehicleselection fueltypeid="9" fueltypedesc="Electricity" sourcetypeid="53" sourcetypename="Single Unit Long-haul Truck"/>
        <onroadvehicleselection fueltypeid="1" fueltypedesc="Gasoline" sourcetypeid="53" sourcetypename="Single Unit Long-haul Truck"/>
        <onroadvehicleselection fueltypeid="3" fueltypedesc="Compressed Natural Gas (CNG)" sourcetypeid="52" sourcetypename="Single Unit Short-haul Truck"/>
        <onroadvehicleselection fueltypeid="2" fueltypedesc="Diesel Fuel" sourcetypeid="52" sourcetypename="Single Unit Short-haul Truck"/>
        <onroadvehicleselection fueltypeid="9" fueltypedesc="Electricity" sourcetypeid="52" sourcetypename="Single Unit Short-haul Truck"/>
        <onroadvehicleselection fueltypeid="1" fueltypedesc="Gasoline" sourcetypeid="52" sourcetypename="Single Unit Short-haul Truck"/>
        <onroadvehicleselection fueltypeid="3" fueltypedesc="Compressed Natural Gas (CNG)" sourcetypeid="42" sourcetypename="Transit Bus"/>
        <onroadvehicleselection fueltypeid="2" fueltypedesc="Diesel Fuel" sourcetypeid="42" sourcetypename="Transit Bus"/>
        <onroadvehicleselection fueltypeid="9" fueltypedesc="Electricity" sourcetypeid="42" sourcetypename="Transit Bus"/>
        <onroadvehicleselection fueltypeid="1" fueltypedesc="Gasoline" sourcetypeid="42" sourcetypename="Transit Bus"/>
    </onroadvehicleselections>
    <offroadvehicleselections>
    </offroadvehicleselections>
    <offroadvehiclesccs>
    </offroadvehiclesccs>
    <roadtypes>
        <roadtype roadtypeid="1" roadtypename="Off-Network" modelCombination="M1"/>
        <roadtype roadtypeid="2" roadtypename="Rural Restricted Access" modelCombination="M1"/>
        <roadtype roadtypeid="3" roadtypename="Rural Unrestricted Access" modelCombination="M1"/>
        <roadtype roadtypeid="4" roadtypename="Urban Restricted Access" modelCombination="M1"/>
        <roadtype roadtypeid="5" roadtypename="Urban Unrestricted Access" modelCombination="M1"/>
    </roadtypes>
    <pollutantprocessassociations>
        <pollutantprocessassociation pollutantkey="118" pollutantname="Composite - NonECPM" processkey="1" processname="Running Exhaust"/>
        <pollutantprocessassociation pollutantkey="118" pollutantname="Composite - NonECPM" processkey="2" processname="Start Exhaust"/>
        <pollutantprocessassociation pollutantkey="118" pollutantname="Composite - NonECPM" processkey="90" processname="Extended Idle Exhaust"/>
        <pollutantprocessassociation pollutantkey="118" pollutantname="Composite - NonECPM" processkey="91" processname="Auxiliary Power Exhaust"/>
        <pollutantprocessassociation pollutantkey="112" pollutantname="Elemental Carbon" processkey="1" processname="Running Exhaust"/>
        <pollutantprocessassociation pollutantkey="112" pollutantname="Elemental Carbon" processkey="2" processname="Start Exhaust"/>
        <pollutantprocessassociation pollutantkey="112" pollutantname="Elemental Carbon" processkey="90" processname="Extended Idle Exhaust"/>
        <pollutantprocessassociation pollutantkey="112" pollutantname="Elemental Carbon" processkey="91" processname="Auxiliary Power Exhaust"/>
        <pollutantprocessassociation pollutantkey="119" pollutantname="H2O (aerosol)" processkey="1" processname="Running Exhaust"/>
        <pollutantprocessassociation pollutantkey="119" pollutantname="H2O (aerosol)" processkey="2" processname="Start Exhaust"/>
        <pollutantprocessassociation pollutantkey="119" pollutantname="H2O (aerosol)" processkey="90" processname="Extended Idle Exhaust"/>
        <pollutantprocessassociation pollutantkey="119" pollutantname="H2O (aerosol)" processkey="91" processname="Auxiliary Power Exhaust"/>
        <pollutantprocessassociation pollutantkey="110" pollutantname="Primary Exhaust PM2.5 - Total" processkey="1" processname="Running Exhaust"/>
        <pollutantprocessassociation pollutantkey="110" pollutantname="Primary Exhaust PM2.5 - Total" processkey="15" processname="Crankcase Running Exhaust"/>
        <pollutantprocessassociation pollutantkey="110" pollutantname="Primary Exhaust PM2.5 - Total" processkey="2" processname="Start Exhaust"/>
        <pollutantprocessassociation pollutantkey="110" pollutantname="Primary Exhaust PM2.5 - Total" processkey="16" processname="Crankcase Start Exhaust"/>
        <pollutantprocessassociation pollutantkey="110" pollutantname="Primary Exhaust PM2.5 - Total" processkey="90" processname="Extended Idle Exhaust"/>
        <pollutantprocessassociation pollutantkey="110" pollutantname="Primary Exhaust PM2.5 - Total" processkey="17" processname="Crankcase Extended Idle Exhaust"/>
        <pollutantprocessassociation pollutantkey="110" pollutantname="Primary Exhaust PM2.5 - Total" processkey="91" processname="Auxiliary Power Exhaust"/>
        <pollutantprocessassociation pollutantkey="115" pollutantname="Sulfate Particulate" processkey="1" processname="Running Exhaust"/>
        <pollutantprocessassociation pollutantkey="115" pollutantname="Sulfate Particulate" processkey="2" processname="Start Exhaust"/>
        <pollutantprocessassociation pollutantkey="115" pollutantname="Sulfate Particulate" processkey="90" processname="Extended Idle Exhaust"/>
        <pollutantprocessassociation pollutantkey="115" pollutantname="Sulfate Particulate" processkey="91" processname="Auxiliary Power Exhaust"/>
    </pollutantprocessassociations>
    <databaseselections>
    </databaseselections>
    <internalcontrolstrategies>
    </internalcontrolstrategies>
    <inputdatabase servername="" databasename="" description=""/>
    <uncertaintyparameters uncertaintymodeenabled="false" numberofrunspersimulation="0" numberofsimulations="0"/>
    <geographicoutputdetail description="LINK"/>
    <outputemissionsbreakdownselection>
        <modelyear selected="false"/>
        <fueltype selected="true"/>
        <fuelsubtype selected="false"/>
        <emissionprocess selected="true"/>
        <onroadoffroad selected="false"/>
        <roadtype selected="true"/>
        <sourceusetype selected="true"/>
        <movesvehicletype selected="false"/>
        <onroadscc selected="false"/>
        <estimateuncertainty selected="false" numberOfIterations="2" keepSampledData="false" keepIterations="false"/>
        <sector selected="false"/>
        <engtechid selected="false"/>
        <hpclass selected="false"/>
        <regclassid selected="true"/>
    </outputemissionsbreakdownselection>
    <outputdatabase servername="localhost" databasename="movesoutput2" description=""/>
    <outputtimestep value="Hour"/>
    <outputvmtdata value="false"/>
    <outputsho value="false"/>
    <outputsh value="false"/>
    <outputshp value="false"/>
    <outputshidling value="true"/>
    <outputstarts value="true"/>
    <outputpopulation value="true"/>
    <scaleinputdatabase servername="localhost" databasename="inputdb2" description=""/>
    <pmsize value="0"/>
    <outputfactors>
        <timefactors selected="true" units="Hours"/>
        <distancefactors selected="true" units="Miles"/>
        <massfactors selected="true" units="Grams" energyunits="Joules"/>
    </outputfactors>
    <savedata>

    </savedata>

    <donotexecute>

    </donotexecute>

    <generatordatabase shouldsave="false" servername="" databasename="" description=""/>
    <donotperformfinalaggregation selected="false"/>
    <lookuptableflags scenarioid="temp" truncateoutput="true" truncateactivity="true" truncatebaserates="true"/>
    <skipdomaindatabasevalidation selected="false"/>
</runspec>
danielbizercox commented 1 month ago

What we mean by splitting RunSpecs by source type is to create a separate RunSpec for each source type. The rest of the RunSpec can be identical--same time spans, same input database, same output database--but for a run with 13 source types, you'd end up with 13 RunSpecs. Your motorcycle RunSpec would have this <onroadvehicleselections> portion:

    <onroadvehicleselections>
        <onroadvehicleselection fueltypeid="1" fueltypedesc="Gasoline" sourcetypeid="11" sourcetypename="Motorcycle"/>
    </onroadvehicleselections>

Your passenger car RunSpec would have the following <onroadvehicleselections> portion:

    <onroadvehicleselections>
        <onroadvehicleselection fueltypeid="2" fueltypedesc="Diesel Fuel" sourcetypeid="21" sourcetypename="Passenger Car"/>
        <onroadvehicleselection fueltypeid="9" fueltypedesc="Electricity" sourcetypeid="21" sourcetypename="Passenger Car"/>
        <onroadvehicleselection fueltypeid="5" fueltypedesc="Ethanol (E-85)" sourcetypeid="21" sourcetypename="Passenger Car"/>
        <onroadvehicleselection fueltypeid="1" fueltypedesc="Gasoline" sourcetypeid="21" sourcetypename="Passenger Car"/>
    </onroadvehicleselections>

And so on and so forth.

With these smaller RunSpecs, the intermediate MariaDB joins will be smaller. Smaller joins typically have better performance than larger joins, so this is why some users may see a performance improvement running 13 RunSpecs sequentially compared to running one single RunSpec.

HansalShah007 commented 1 month ago

@danielbizercox thanks for describing the strategy. Is it advisable to have separate output databases for each split of the runspec file? Can it improve performance if I run all the splits in parallel with worker partitioning?

danielbizercox commented 1 month ago

Deciding whether or not to use the same output database depends on your post-processing preferences. However, typically we'd recommend using the same output database. When doing so, the only difference in your output between doing 1 run vs. doing 13 runs is that each source type will also have a different MOVESRunID value in your movesoutput table.

Regarding "worker partitioning", I'm not sure what you mean by that. You can only start one main MOVES process per computer. You can launch additional MOVES workers, which may potentially speed up each individual run, but this typically has a minor impact and we generally do not see much improvement beyond 3 workers.

However, if you have multiple computers with MOVES installed (e.g., a cluster of VMs), you can launch each RunSpec in parallel, and that will produce output significantly faster. However, this will result in separate output databases on each computer. To facilitate post-processing in this use case, we have a MOVES Output Grouper tool that can stitch together multiple output databases into a single one: https://github.com/USEPA/EPA_MOVES_Model/blob/master/tools/MOVESOutputGrouper.md

HansalShah007 commented 1 month ago

@danielbizercox by "worker partitioning" I mean that I start multiple sets of workers on multiple command lines each with a different shared folder configuration. Its a lot of manual work but I wanted to test this out.

So, essentially:

  1. I change the sharedDistributedFolderPath field in the manyworkers.txt and WorkerConfiguration.txt file and start a group of 3 workers using the command ant 3workers -Dnoshutdown=1.
  2. I then start a MOVES run for one of the splits of the runspec and this time I change the sharedDistributedFolderPath field in the MOVESConfiguration.txt file to add the TODO files in the same shared folder as the one configured for the 3 workers running above.
  3. I then repeat steps 1 and 2 for the remaining splits of the runspec, with a different shared folder config for each split.

I store the output of the splits in different output databases.

Will this give me accurate results or is it not possible to do this? I believe I can do this as long as I have enough logical processors on one computer to run 13 runspec files in parallel where each one of them have a dedicated set of workers helping them.

danielbizercox commented 1 month ago

It is not possible to run 13 RunSpec files in parallel; each call to ant run -Drunspec=... creates a MOVES main process, and you can only have one main process running per computer. See https://github.com/USEPA/EPA_MOVES_Model/issues/75#issuecomment-2231845461 for more details.

HansalShah007 commented 2 weeks ago

Thanks for the information.