Esri / large-network-analysis-tools

Tools and code samples for solving large network analysis problems in ArcGIS Pro
Apache License 2.0
62 stars 7 forks source link

Differing results when using arcpy vs. ArcPro #47

Closed WYMcCunE closed 5 months ago

WYMcCunE commented 5 months ago

When using the same set of parameters where the origins are a feature point class in an ArcSDE with 17,000 points, and the destinations are a feature point class in the same SDE with 45 points, with the generic ESRI trucking distance portal, I get a different output when using the tool in ArcGIS Pro vs. running in a python environment (Esri 3.9.11) when I join the output csv back to the output origins on "ORIGIN_OID" where csv dir is a temporary folder.

    arcpy.LargeNetworkAnalysisTools.SolveLargeODCostMatrix(
        Origins=origins_fc,
        Destinations=destinations_fc,
        Network_Data_Source=network_portal,
        Travel_Mode="Trucking Distance",
        Time_Units="Minutes",
        Distance_Units="Miles",
        Max_Inputs_Per_Chunk=1000,
        Max_Processes=4,
        Output_Updated_Origins=out_origins_fc,
        Output_Updated_Destinations=out_destinations_fc,
        Output_Format="CSV files",
        # Output_OD_Lines_Feature_Class=None,
        Output_Folder=csv_dir,
        Cutoff=75,
        # Sort_Inputs=True
        # Num_Destinations=None,
        # Time_Of_Day=None,
        # Barriers=None,
        # Precalculate_Network_Locations=None
    )

Pro Output joined to CSV : pro-output

Python command output (run in FME) joined to CSV: image

mmorang commented 5 months ago

Interesting. You said you joined the output CSV back to the output origins. Did you do the join in Pro? Instead of looking at the image or analyzing the joined feature classes, can you use the Feature Compare tool to compare the output origins (without the join) for both cases and compare the output CSV using some kind of text file comparator or pandas or something? Is there any difference there? If not, the difference would seem to be in the join process.

Also, did you run both analyses around the same time? If you're using the ArcGIS Online services (sounds like you are), we recently made a major data update. That could cause some differences in results if you ran one before the update and one after. If you think this applies to you, you can give me the date you ran the analyses, and I can let you know if it seems likely to be affected by those data updates.

The second image looks really suspicious, as if a specific subset of the records is missing. Did you get any warning messages when running the tool? In Pro, you will see warnings printed to the GP window if some OD Cost Matrix subprocesses fail for some reason. In standalone Python, you would have to explicitly retrieve the messages using arcpy.GetMessges() after the tool finishes running, so maybe you have missed that opportunity.

WYMcCunE commented 5 months ago

I have been running the arcpy script in FME Form, which does not always retrieve or log all messages.

The join back, both with the Pro output csv and the arcpy csv, occurs in FME Form, so it is the output of the csv that is the issue. The behavior and specific missing points in the arcpy script has been consistent. Both analyses were run the same day as the original post. image

For further testing I ran the snippet only in an Arcpy ipython console and did not find a difference between the datasets in either the csv or the output origins and destinations in FeatureCompare.

I am not sure why the arcpy script in FME's Python Caller produces this unexpected output, but will reach out to them. image

Thank you

mmorang commented 5 months ago

When you run Feature Compare, be sure to check on the option to output all differences. Otherwise it will tell you only the first difference it finds and then stop checking.

I know what FME is, but I don't really know much about it and have no experience using it, so I can't really help you debug what's happening. If you are 100% sure that the raw tool outputs are the same between your manual workflow and your FME workflow before the join, then it seems like the FME join is the problem.

Please let me know what you find out.

WYMcCunE commented 5 months ago

It is not the join, as the join in FME with the three outputs to their respective csvs all occur in FME (that is where the sceenshots are from) it is in the actual output CSV. The CSV output from the arcpy run in FME is missing 3662 OriginOIDs that are generated in arcpy and Pro

~The output origins all match one another, the only difference is the number of columns~ Edit; the difference seem minor / not related to any of the join keys

pro-arc image

fme-pro: image

mmorang commented 5 months ago

Sorry, I'm having trouble keeping track of which things you're comparing. Can we break it down?

The tool output should include:

Let's forget about the join for the moment. Which of the things above is different between your manual Pro workflow and your FME workflow (before you do the join)?

For the Feature Compare results in your most recent comment: Which feature class is being compared? Is that after the join?

WYMcCunE commented 5 months ago

✔️ A set of CSV files in a folder, named like ODLinesO#_#D#_#.csv ✔️ A feature class with the output origins, which includes OIDs matching what's in the CSV files ✔️ A feature class with the output destinations, which includes OIDs matching what's in the CSV files

Is present for the three methods of generation:

All inputs for the three methods are the same, the difference is in where the tool is run

For the feature compare: I am sorting by Origin ID. I am the feature class with the output origins, which has OOID. pre-join aka

A feature class with the output origins, which includes OIDs matching what's in the CSV files I have updated my original comment for clarity.

mmorang commented 5 months ago

Okay. If I have understood your latest comment correctly, the output feature class of origins (unjoined) is missing rows from the FME workflow. Is that correct?

If that's the case, then we have to figure out why all the origins didn't make it through the processing. The origins feature class is pretty much a straight copy of the input origins feature class. It seems like some of them must be getting filtered out somehow. Does the FME workflow have a limit to the number of features that can be added? Does the input have a selection set or definition query?

WYMcCunE commented 5 months ago

Clarification: The output CSV is missing OIDs we see in the output feature class

mmorang commented 5 months ago

Okay, so the output feature class of origins is identical. Is that right?

The problem is with the output CSV files. Is that right? Okay, let's try to pin down what the problem with the CSV files is. Here are some possibilities:

  1. The number of output CSV files is different. Some CSV files are missing. You should be able to see which chunks are missing based on the naming scheme, ODLines_O_#_#_D_#_#.csv. The # signs represent the ObjectID ranges of the origins and destinations in the chunk.
  2. All the output CSV files are present, but the total number of rows is different. In other words, the contents of the CSV files is different. You can test this by reading all the CSV files into combined pandas dataframes or something like that and counting the rows.
  3. The CSV files are the same and have the same number of rows, but the OriginOID values are different/incorrect. For instance, maybe the OriginOID values were incorrectly reset to 1 and go like 1, 2, 3 instead of 382, 383, 384. This could mess up the joins.

Can you determine which of these best describes the difference in the CSV files between your FME results and your Pro or arcpy results?

WYMcCunE commented 5 months ago

Okay, so the output feature class of origins is identical. Is that right? Correct

  1. The number of files are the same
  2. The content of the files are different, resulting in dropped rows and differing file sizes

In working with this, I discovered that a differing build number in FME in combination with passing the csv foldername as r"\\servername\shared_folder_for_csvs" instead of "\\\\servername\shared_folder_for_csvs" resulted in resolution to number 2.

What is to note, is that "\\\\servername\shared_folder_for_csvs" did result in the correct output when passed to arcpy. This discrepancy might be because of the interpreter that FME uses for its PythonCaller,

mmorang commented 5 months ago

Oh, interesting. So does this completely solve your problem, or is there still some discrepancy remaining?

WYMcCunE commented 5 months ago

I believe this solves the issue more or less, thank you for your response.