SuperDARN / rst

Radar Software Toolkit (RST)
https://superdarn.github.io/rst/
GNU General Public License v3.0
22 stars 18 forks source link

hdw.dat github repo #3

Closed ksterne closed 5 years ago

ksterne commented 7 years ago

Think we'll ever be able to separate out the hdw.dat files from this repository and have them look at the hdw.dat repo? Not so much of a problem right now, but maybe this is a long term goal on trying to figure out how to do that. davitpy devs are striving for a similar thing as a long term goal.

egthomas commented 7 years ago

My concern with doing so is that the majority of the RST cannot function without hdw.dat files present. So presumably a new user would need to download and compile the RST, and then go to another github repository and copy the hdw.dat files from there.

pasha-ponomarenko commented 7 years ago

For some distant future, I believe that the optimal solution would be to include concurrent hardware information required for processing and plotting the data in the rawacf files. Technically, adding extra fields to the parameter structure sounds simple and relatively harmless, although, keeping in mind the on-going problems with channels, I would not put my money on it.

sshepherd commented 7 years ago

I respectfully disagree with putting the hardware information in rawacf files. There are too many examples of parameters changing (radar locations, interferometer offsets, tdiff, etc.) that would require reprocessing numerous rawacf files and subsequent downloads from the mirror sites. I think that the hdw.dat files, while not ideal, are a reasonable solution, however, they are poorly maintained.

ksterne commented 7 years ago

Certainly agree that the information within the files would be optimal, though I wonder if it increases the file size significantly to have the same repeated information in each record.

I think my vision here is that either during the compilation process or in the installation instruction set, the github hdw.dat repo is cloned/downloaded. To make this even better, compilation code would go and check for updates to the repo and pull new files when needed.

pasha-ponomarenko commented 7 years ago

Kevin S, my feeling is that the increase in the file size would be negligible even if we add this info to each record because PRM structure is much smaller than RAW or FIT. Also, one does not need to add coordinates and boresight to every record anyway, it could be done once per a two-hour data file.

Simon, both ways have their advantages and shortcomings, but how frequently a radar changes its position? With respect to interferometry, right now the tdiff information is useless for several reasons, and I am not sure how useful it is going to be in future, keeping in mind continuously changing phasing properties of the hardware due to ageing etc. In this case it would be more practical to apply some sort of post-calibration while processing a RAWACF data file. It might sound a bit ambitious, but it is certainly doable. Again, this approach does not necessitate abandoning the current system: one can firstly look for the presence of hardware information in the datafile, and if it is not there then just use the external source.

sshepherd commented 7 years ago

Having some parameters that change only every two hours (assuming that files don't occur more frequently) and others that don't complicates the processing. There are a few radars whose position is incorrect to tens of km and one that is at least 100 km. Gareth mentioned that the boresight of Halley has changed by more than one beam over the years. Yes, these are infrequent but require reprocessing many years worth of rawacf files and pushing them out through the mirror.

Yes, our ability to make AoA measurements is pretty bad, but I don't think modifying rawacf files is the solution.

Lastly, I think it should actually be the other way round; if there is a number in the hdw.dat file that should be used rather than the number in the rawacf file. We have an example of this already with incorrect rxrise values in rawacf files (Evan will correct me if I am wrong here.) Unless we reprocess many years of data for most of the radars these numbers are not correct and affect geolocation of scatter. The change is likely less than a range-gate but could be important as we continue to increase our range resolution.

ksterne commented 7 years ago

I feel as though we're getting a little off topic here. Yes there's many more questions about hdw.dat files and issues related to them, but please take those to Italy or somewhere else. Here I'm trying to find the answer to the question: how do we keep the hdw.dat files in this repository synchronized with the hdw.dat repository?

As we mentioned before, it's not likely something that's going to happen in the coming days or weeks, but it would be a good long-term goal to strive for.

sshepherd commented 7 years ago

As Evan pointed out, RST does not function without these files so they should be part of the RST distribution. The file "radar.dat" should be included since it is also required.

egthomas commented 7 years ago

That would be an interesting solution if the compilation software could automatically clone/copy from the hdw.dat repository. Does that work having one directory tied to a git repository (hdw.dats) inside of a larger directory tied to a different git repository (the rst)?

kkotyk commented 7 years ago

@egthomas https://git-scm.com/book/en/v2/Git-Tools-Submodules

sshepherd commented 7 years ago

Is this still an issue? The real problem seems to be not where the hdw.dat files are located (they need to be part of the RST) but how to get files update correctly when known changes are made and those doing the changes do not update the appropriate hdw.dat files.

asreimer commented 6 years ago

As I mentioned to @ksterne (and maybe @egthomas) at the workshop, git has a nice fancy feature called "submodules": https://git-scm.com/book/en/v2/Git-Tools-Submodules

This is a perfect usage case for these.

sshepherd commented 6 years ago

I'll just comment (as I did in 2013, and just reread) that we don't really need any more 'fancy' solutions for the hdw.dat files. We just need a location (website) that keeps them up to date and people can go an get them. Yes, they need to go into the RST, but having a website where they are all located (and are up-to-date) would, in my opinion, be infinitely more useful than the 'fancy' solutions.

Simon

On Fri, Jun 15, 2018 at 7:31 PM, Ashton Reimer notifications@github.com wrote:

As I mentioned to @ksterne https://github.com/ksterne (and maybe @egthomas https://github.com/egthomas) at the workshop, git has a nice fancy feature called "submodules": https://git-scm.com/book/en/ v2/Git-Tools-Submodules

This is a perfect usage case for these.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SuperDARN/rst/issues/3#issuecomment-397766701, or mute the thread https://github.com/notifications/unsubscribe-auth/AC2DPyagKFAn-9bErkAtvootouDaeB62ks5t9EPIgaJpZM4L2nvA .

asreimer commented 6 years ago

@sshepherd submodules are hardly a fancy solution. It's a standard thing to do and it's a standard part of git. All that does is make it much easier to maintain the same files in two (or more) different places. In other words, it makes the job easier for people like @ksterne.

If the files don't need to be in their own separate repo, then there's no need to use the git submodule functionality.

I agree about having a link on a website pointing people to them. Which website? That discussion seems out of scope of this "hdw.dat github repo" issue.

sshepherd commented 6 years ago

I was merely using your own words Ashton. No, we just do it. I'll do it if you like. If it's useful, then it serves it's purpose.

On Fri, Jun 15, 2018 at 9:14 PM, Ashton Reimer notifications@github.com wrote:

@sshepherd https://github.com/sshepherd submodules are hardly a fancy solution. It's a standard thing to do and it's a standard part of git. All that does is make it much easier to maintain the same files in two (or more) different places. In other words, it makes the job easier for people like @ksterne https://github.com/ksterne.

If the files don't need to be in their own separate repo, then there's no need to use the git submodule functionality.

I agree about having a link on a website pointing people to them. Which website? That discussion seems out of scope of this "hdw.dat github repo" issue.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SuperDARN/rst/issues/3#issuecomment-397776237, or mute the thread https://github.com/notifications/unsubscribe-auth/AC2DP1gIU-YVQ6ua-DtwB03aBX4bxMKjks5t9FwHgaJpZM4L2nvA .

asreimer commented 6 years ago

I'll be sure to remember to be entirely deadpan in future comments so my words aren't used against me :)

mts299 commented 5 years ago

@ksterne @asreimer Did we figure out a solution for this? I wouldn't mind a separate repo for Hardware files since pydarn will need them :)

asreimer commented 5 years ago

@mts299

I think @ksterne and I are on the same page about either: 1) pulling hdw.dat out of RST all together and then requiring the hdw.dat repo to be installed as a dependency, or 2) making the hdw.dat in RST a submodule of a standalone hdw.dat repo.

There is already a separate repository for hdw.dat files: https://github.com/vtsuperdarn/hdw.dat but maybe that repo should be moved under the "SuperDARN" org banner from the VTSuperDARN org? @ksterne @egthomas any opinions on this?

egthomas commented 5 years ago

I've tried out the submodule approach on a branch in a forked repository:

https://github.com/egthomas/rst/tree/hdw_submodule

The biggest problem I see is that if someone downloads a zip archive of this branch, the hdw submodule is not included (ie the tables/superdarn/hdw directory is present but empty). This means the user would need to download the hdw repository separately, further complicating our installation instructions for the subset of users who do not use git (and breaking the DOI versioning of RST as a stand-alone product).

mts299 commented 5 years ago

Not to be the devil's advocate here but, I could argue having the hardware files in RST alone makes the installation process for pydarn more difficult as well and superfluous if users don't want RST on their computer. (This might be with VT had their own repo for DaViTpy installation purposes)

I mean the installation instructions would just increase with 1 extra step by saying "download hardware.zip inside RST/hardware/folder or set some environment variable." I feel most people get stuck on the CDF installation anyways :p

Another option is creating some git hook (not sure how to do this) but the idea would be the hardware files live in another repo and everytime you update that repo the files are re-zipped and stored in the RST repo. Then during installation, you just add an unzipping process in the build script. Keeping them in the git and zip installation.

DOI, I don't really know about that ¯_(ツ)_/¯

asreimer commented 5 years ago

@mts299 Exactly. The point is that HDW.DAT files are not exclusively used by RST, so RST shouldn't have ownership.

@egthomas is this really a problem? I don't think so. Similar to what @mts299 said, you can even easily attach binaries to a release when you make it. This is a feature built in to github. So just zip up the HDW.DAT repo and attach that as a binary with the REST release.

None of this breaks the DOI versioning at all. What does "stand alone product mean"? Do we ship a full linux distro with RST? We should adjust the thinking process here and keep in mind that what we are discussing is properly handling HDW.DAT as a dependency, just like CDF.

pasha-ponomarenko commented 5 years ago

Ashton, there is a clear distinction: hardware information is essential to RST operation while CDF conversion is non-essential.

asreimer commented 5 years ago

@pasha-ponomarenko Of course! I agree with that, however, both are still dependencies. edit: we also ask users to install a bunch of linux libraries, which are 100% essential to RST operation, but we don't package those in to RST.

hdw.dat files are used in several other software packages used in the SuperDARN community, they are not exclusive RST. This is the distinction that matters from a dependency point of view.

This is why the VT group has a standalone hdw.dat repo isn't it?

Circling back to the whole point of @ksterne opening this issue: Why are we maintaining multiple sources of hdw.dat files in the community and duplicating time/effort? Doesn't make sense. So given the needs of the community (using hdw.dat without needing RST), what is the problem with pulling hdw.dat out of RST and making them a dependency?

There doesn't actually seem to be any real problem with this, barring 1 extra 1 time step of installing hdw.dat when installing RST.

asreimer commented 5 years ago

Alright so, doesn't seem like the submodule thing is what people want to do/it doesn't integrate well with github.

So, in the interests of getting things done, if I were to: 1) create a hdw.dat repo under the SuperDARN org 2) clone the existing vtsuperdarn/hdw.dat repo into it 3) create a PR in RST for making hdw.dat a dependency by: a) removing the hdw.dat directory from the RST repo b) changing the readme and install instructions

Would this be a waste of time or would such a thing be accepted?

ksterne commented 5 years ago

Sorry on the slow response, still trying to get caught up from things from August of last year.

Is there a big reason to move the hdw.dat repo from the vtsuperdarn to the SuperDARN org? The only reason that its in the vtsuperdarn org is:

  1. That org was formed before this one as a few of us here at VT took to github a while ago
  2. I'm the co-chair of the DDWG which has been in control of these files since they are "data" files. Though that was back before the DAWG was as active as it is.

The downside I see of this move is that I've already established a working repo that's being used and referenced around the community. We'll have to make effort to update scripts and announce a new location for these files.

asreimer commented 5 years ago

@ksterne So moving the repo seems to be a decision that DDWG and DAWG should make.

I only suggest moving it because there isn't really anything VT specific about the hdw.dat files and RST has been moved to the institution agnostic SuperDARN org, so why not do the same thing for consistency.

edit: @ksterne any opinion about pulling hdw.dat out of RST and making it a required dependency, just like how other dependencies are handled?

pasha-ponomarenko commented 5 years ago

Why one has to remove the hdw.dat from RST rather than to update it if/when necessary?

pasha-ponomarenko commented 5 years ago

..to remove from RST...

asreimer commented 5 years ago

@pasha-ponomarenko RST is not the only thing in the SuperDARN community that needs to use the hdw.dat files and there's already a standalone hdw.dat repo. So the process for updating hdw.dat files is currently: 1) Update the vtsuperdarn/hdw.dat repo 2) Update the SuperDARN/rst repo

That takes extra time and duplicates effort.

edit: davitpy needs hdw.dat files and pydarn will need them too. Anyone who rolls their own fan plotting software needs the hdw.dat files. How many repos do we need to worry about updating when one hdw.dat file changes?

pasha-ponomarenko commented 5 years ago

My concern is that if it is removed from the RST repo, the there is a danger of not being able to install a working version RST if the VT repo is down. What is wrong with having a "generic" (i.e the latest available at the moment of RST version release) copy of hdw.dat in the RST bundle? Upon installation, make then might try to update the file, and if the update is impossible, RST will still be working properly -- hardware information is not updated every week...

I might be wrong, but for me this looks like a possible scenario.

Back to my IT duties... ;-)

asreimer commented 5 years ago

@pasha-ponomarenko I really don't think that concern is valid. We have a really bad habit of inventing unrealistic worries. If the VT hdw.dat repo is down, github would be down and you wouldn't be installing RST anyway.

If you want to bundle the hdw.dat with RST when you make an RST release, you can. You can create a zip of the hdw.dat repo and attach it to the RST release as an attached binary. This is built-in functionality of github. The release is what is linked to in Zenodo, so the DOI would redirect.

asreimer commented 5 years ago

Here's another issue that the current RST fails to address, but pulling hdw.dat out would solve:

What if I want to use RST4.0 but the latest hdw.dat files?

sshepherd commented 5 years ago

I agree with Pasha. RST is the defacto software package for SuperDARN and the hdw.dat files should be part of it. Yes, there are others. Many others. But they should adjust the way they incorporate hdw.dat files, not the other way around.

On Wed, Jan 23, 2019 at 3:57 PM pasha-ponomarenko notifications@github.com wrote:

My concern is that if it is removed from the RST repo, the there is a danger of not being able to install a working version RST if the VT repo is down. What is wrong with having a "generic" (i.e the latest available at the moment of RST version release) copy of hdw.dat in the RST bundle? Upon installation, make then might try to update the file, and if the update is impossible, RST will still be working properly -- hardware information is not updated every week...

I might be wrong, but for me this looks like a possible scenario.

Back to my IT duties... ;-)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SuperDARN/rst/issues/3#issuecomment-456963170, or mute the thread https://github.com/notifications/unsubscribe-auth/AC2DPzfauIEWjfK4A93OTtdLjXpv21rLks5vGMy2gaJpZM4L2nvA .

asreimer commented 5 years ago

@sshepherd this a tautology. And the point is that the hdw.dat files are already in a separate repository, maintained by the DDWG (correct me if I'm wrong @ksterne) and other software packages are already just pulling from it.

asreimer commented 5 years ago

So, @ksterne still hasn't gotten answer to the question, "how do we keep the hdw.dat files in this repository synchronized with the hdw.dat repository?"

To summarize, we've discussed: 1) using git submodules, but that doesn't seem to integrate nicely with github releases 2) remove the hdw.dat files from RST and require users to install from the existing vtsuperdarn/hdw.dat repo. People don't seem to like this option either, despite the issue of packaging hdw.dat with RST releases being addressed.

@ksterne it looks like the best path forward on this issue may be to take it offline and make this a working group issue at the workshop. I think it's safe to close this issue here as there isn't any consensus.

So probably this issue can just be closed?

mts299 commented 5 years ago

Sorry for the late response maybe I could have cleared things up a while ago.

@pasha-ponomarenko For Nasa CDF library that was my bad on not realizing it wasn't essential. In the past, I have just noticed users struggling with that library and the make build system failing because of it. However, that seems to be a build issue - so out of scope for this PR.

@ksterne Personally, my solution may not matter on where the repo is, just centralization is nice. One point I will make about these hardware files (software point of view), they seem more like configuration files rather than data files. So I would argue they are a DAWG problem because they are used for software purpose to do analysis and plotting with the raw data. The last conference I sat in both working groups and DDWG seemed more concerned about data distribution and keeping the data file consistent than hardware file updates. However, I don't know much so my opinion at this point would be maybe having only the chairs of DDWG and DAWG meet up and discuss who "owns" or deals with the hardware files. Anyways... moving on to the actual topic at hand...

@pasha-ponomarenko and @sshepherd I see the concern of removing hardware files from RST, and I may have a solution to satisfy everyone.

Here is my purposed plan which I will try to test first in forked repo's to ensure it is a possibility. Synchronization between repo's :) done. The idea here is github has these nice webhooks (that we use for our website) that you can synchronize files between repo's (not sure about between github organizations). The idea here is we would have hardware file repo containing all the hardware files, every time it is updated it will copy those files to RST repo. Thus things are the same, and people are only updating in one place. This will also solve the concern of zipping it and no extra installation needed.

However, I need some time to do some tinkering with it and see if there is any issues or caveats that we would have to discuss. I will look into this in March when I get to developing plots with pydarn needing hardware files.

PS. If this all fails pydarn probably could just deal with pulling from VT's repo for hardware files.

egthomas commented 5 years ago

So after some more experimentation I think the submodule implementation is clever but clunky. As others have previously suggested, why don't we create a new repository under the SuperDARN organization for the hdw.dat files so everything is under one umbrella. Official updates to hdw.dat files will be made directly to this repository.

When it's time for a new RST release, we just copy the updated hdw.dat files into the appropriate directory of the RST on the release branch. This way a complete set of up-to-date hdw.dat files (at the time of release) are included in the RST package. (Note that we already have to manually go in and update a few things on the release branch such as the rst.version file.)

Other repositories which depend on the hdw.dat files can still use submodules or some other method to pull from that single hdw.dat repository and everyone is happy(-ish).

ksterne commented 5 years ago

I've got some follow up from this going back to @mts299's post but waiting on @kevinkrieger's go ahead to send a note out here. He's out this week so I'll follow up with him early next week.

ksterne commented 5 years ago

Here's the summary from our meet-up:

@pasha-ponomarenko, @kevinkrieger, and @ksterne met up via teleconference in early February 2019 to discuss some issues that had come up in this issue. At the core of the issue was having to maintain 2 separate copies of the hdw.dat files, one in this repo and one in the vtsuperdarn/hdw.dat github repo. The original question of this issue was whether or not the rst repo could use the hdw.dat repo as a source. Several alternatives were discussed but all had issues associated with them; the most reasonable and straightforward solution appears to be utilizing webhooks that @mts299 suggested to automatically update the rst hdw.dat files when the main hdw.dat repo is updated. Since it has already been used as a reference repo for several groups in the SuperDARN community, the vtsuperdarn/hdw.dat repo will be used as the reference for the webhooks for now.

So, @mts299, do you think you might be able to implement this for us here? Otherwise someday I could try taking a look at it, no promises as the spring is filling up quickly.

mts299 commented 5 years ago

@ksterne yes I think I can implement, however, I won't be able to look at it until March if that is okay? I may also need some help from you and @egthomas just in case I need admin privileges between the github repos.

egthomas commented 5 years ago

I'm going to again respectfully disagree with keeping the hdw.dat files under an institution-specific repository. None of the SD working groups are tied to any one institution - they are comprised of members from different institutions across the world. The idea that users can't cope with a change in software/file location is the same argument that was presented when moving from the VTRST3.5 repository to this one.

To practice what I'm preaching, I've moved the monthly schedule files from the Dartmouth SD website to a new repository under the SD organization: https://github.com/superdarn/schedules

kevinkrieger commented 5 years ago

Hey @egthomas do you have a measure for how many people use the hdw.dat repo in a scripted way? Also, do you have a measure for how many people use the Dartmouth website schedule files in a scripted way? (We at usask do, so we will need to change this now if you are removing them from the website, but I suspect not everyone does).

ksterne commented 5 years ago

@egthomas, we certainly discussed where the files should live. Our idea for now was to get the webhooks working with the repo as it is now. Then once that's established and working, we'll revisit and likely be able to move the repo to a new org. Not to spend too much time on it as it's a deviant away from this issue, but the vtsuperdarn org is institution specific in name only, it has members from around the SuperDARN community much more than just VT. Most of which have moved on or are no longer with us. And I agree that working groups are compromised of several institutions.

My statement above wasn't a forever and always kind of idea. But for starters lets use the vtsuperdarn/hdw.dat repo. Then in 6 months, a year, once we're comfortable with things, we can see moving to the SD org and deprecating the vtsuperdarn repo. Does this sound more agreeable @egthomas?

I too had started to used your schedule files as a reference on our website, so I'll need to move these links in the future.

mts299 commented 5 years ago

@ecbland how do you feel about closing this issue and opening a new issue specifically for the webhooks and once those are settled we can open another issue on hardware file repo location?

This issue has been discussed numerous times now and there seems to be a plan in motion on solving the original issue. However, this issue has ballooned out so I think a lot of the original translation has been lost.

ecbland commented 5 years ago

@mts299 Yes, that's a good idea. Go for it!