Open espg opened 4 days ago
@pdsmith90 you asked a couple of weeks ago if unit tests were hard to write... this might be a good first issue to get experience with them if you want some practice writing and setting them up.
We need several tests, but the most simple unit test that we need from above is something that takes a list of stations and checks to see if there are any duplicate entries within that list. We'd likely end up integrating that unit test with other related unit tests in the same test file-- for example, a test that first combines tie stations and subnetworks, and then checks a.) if all the entries from each data structure are present, and b.) if any of those entries are duplicates.
To see how the tests are setup in general, have a look here at how my clustering tests are setup. The link above points to my Parallel.Gamit fork, since those unit tests and the automated framework to run them aren't in our master branch yet. Once @demiangomez merges #109 , any unit tests that you write will automatically run on any push you make to an open pull request into master (or any other branch), which saves you from having to setup the testing framework locally when you're just getting started with these.
@espg : the behavior is to include ALL station in the stations array in the database (including ties). Method recover_subnets
appends all stations except those in the ties list. The ties are later on appended to the processing list by GamitSession and, if the processing is not "ready", then a new record is inserted in the database
Also, I checked with Eric and days 120-239 were run with -purge
, which deletes everything from the database. So this is not an issue from before. There is something in the code that is duplicating stations.
There is something in the code that is duplicating stations.
That something in the code that's duplicating happens right here (from 90 onwards), in pyGamitSession
:
...and is documented as happening on the last quoted line here:
@demiangomez the question is should the stations and station ties be merged into the station_instances
list inside of GamitSession
? The initial cut on clustering (https://github.com/demiangomez/Parallel.GAMIT/commit/fbb575090e16871a5382a2d124f70309eaea5ac8) didn't include station ties at all, since they were included already in the station list. The reason that the station ties list were added back in was for compatibility in the kml output which marks the tie points with different glyphs. The reason that we don't run the tie point extraction code inside of GamitSession
is because we need the full dataset of input coords to do the cluster overlap, and we only pass the station subset to the GamitSession
's that are run in parallel.
Right now we have a situation where we:
pyNetwork
to pull out tie stations for kml compatibilityGamitSession
that has the express purpose of replicating what we already have at number 1 of this listIf we decide to address this inside of pyNetwork
instead of inside GamitSession
, this is what our code flow will look like:
GamitSession
GamitSession
There are probably better options. We could, as a few possibilities, do some, all, or none of:
GamitSession
and remove the tie + station merging codepyNetwork
if we're worried about keeping track of tiesLine 90 onwards of pyGamitSession
is not duplicating stations that go into the database. It is combining the instances from stations and ties for the GAMIT run. If stations are duplicated when inserted to the database, then it is because the station list contains the ties + stations (created in pyNetwork), as far as I can see.
@demiangomez the question is should the stations and station ties be merged into the
station_instances
list inside ofGamitSession
?
As I mentioned before, YES
The initial cut on clustering (fbb5750) didn't include station ties at all, since they were included already in the station list. The reason that the station ties list were added back in was for compatibility in the kml output which marks the tie points with different glyphs.
The are other reasons why we need the ties identified, kml being one, but it is not the most important reason. The point here is that we need to make sure that the station set passed to pyGamitSession
does not include the ties. Conclusion: we need pyNetwork to create a station list that does not have the ties and a ties list that contains the ties only. These are then added together by pyGamitSession
but they can be separated because a tie list exists. Please change this ASAP so that we can test everything.
Thanks
This issue is closely tied this discussion, so please read the linked content before continuing.
Examining the data from this query:
SELECT * FROM public.gamit_subnets where "DOY"='180' and "Year"='2022'
Shows interesting behavior:
Which outputs the following (note the color highlight)
$${\color{blue}igs.badg,igs.cas1,igs.coco,igs.daej,igs.darw,igs.dumg,igs.guam,}$$ $${\color{blue}igs.hob2,igs.hrao,igs.iisc,igs.kiru,igs.mal2,igs.mcil,igs.mobs,igs.nklg,igs.pohn,igs.pol2,igs.reun,}$$ $${\color{red}igs.cas1,igs.darw,igs.dumg,igs.hob2,igs.hrao,igs.kiru,igs.mal2,igs.mobs,igs.nklg,igs.pol2,igs.reun}$$
All of the red entries above are duplicates of stations already listed in the blue highlighting.
For
public.gamit_subnets
on DOY of 2022, there are 17 listed clusters in the data table, with the first cluster (labeled subnet 0) being the backbone network. That leaves 16 clusters, which correspond to the 16 clusters thatmake_clusters
produces. Since index zero in the postgres data table corresponds to the backbone, the indexing is off by 1; i.e.,df.iloc[1].stations
compares toa[0]
andb[0]
froma, b = make_clusters(points.T, stations)
, with "a" and "b" being theclusters
dictionary andcluster_ties
list respectively.This is the zero-th entry for cluster stations from the clusters dictionary-- note that it's identical to the blue highlighted text from
public.gamit_subnets
table for DOY 180 in 2022:Now, this is the output from the
cluster_ties
list, which is identical to the red highlighted text frompublic.gamit_subnets
table for DOY 180 in 2022:Looking at two additional entries from
public.gamit_subnets
and theclusters
dictionary &cluster_ties
list confirms the pattern.Questions
public.gamit_soln
2022 days 001-008?GamitSession
, we can fix the issue with the code from the previous bullet or similarstations
include the tie stations?GamitSession
wants these two data objects (tie points and station clusters) not to overlap.GamitSession
than what is setup inpyNetwork