IDEMSInternational / R-Instat

A statistics software package powered by R
http://r-instat.org/
GNU General Public License v3.0
38 stars 103 forks source link

New dialogue to export data for Climsoft #6377

Open rdstern opened 3 years ago

rdstern commented 3 years ago
  1. This is to be added to the Climatic > File menu. It is called Export for Climsoft. It can be the first Export. At the same time please change the others to for, rather than to. (I suggest that the word for reflects that we are preparing the file correctly. But we are not then going into Climsoft.)
  2. The data must be defined as climatic. In the definition of Climatic it must have a Station variable, which is the Station ID, even if it is for a single station. Also all the elements must be defined in the list.
  3. There is the usual data selector.
  4. There is a single receiver for the Station ID - this is filled automatically
  5. There is the usual Date receiver - this is filled automatically.
  6. There is a multiple receiver for the elements. Default is for all the elements to be included.
  7. Then the save file as, with the Browse. (The default is csv and maybe that is all we allow.)
  8. Interesting to allow Export Comments also. If Excel is acceptable that would mean the 2 data frames can be exported together.

For this to work smoothly we will have to stack the data when there are multiple elements. We will also have to include the Climsoft codes with the elements if this is not done already. I doubt it.

And we need to know how to account for trace. They are stored as 0.03 in R-Instat and we could arrange for a flag variable to be constructed?

Patowhiz commented 3 years ago

@rdstern its important to note that, climsoft only imports csv. So what's the intention here. I thought the main intention is to have the exported data immediately ready for importing to climsoft. If that's the case the comments and exported to excel wouldn't be ideal.

I agree we could have 0.03 exported with flag T for trace. We would also need to have the missing values exported with M for missing.

Patowhiz commented 3 years ago

@rdstern in regards to the elements. I also think climsoft will only accept element codes. So will the data frame always have to have elements metadata?

mhabimana commented 3 years ago

@Patowhiz -Climsoft will accept data files in all text format including CSV but not Excel.

mhabimana commented 3 years ago

@Patowhiz - True, data file will need element code to describe data, especially when the file contains data of multiple elements.

rdstern commented 3 years ago

@Patowhiz I am delighted to continue this discussion to clarify the needs for this dialogue. I am still not sure about the importing of data into Climsoft, that includes the flag fields. This is particularly important to be able to include the trace values. I assume in R-Instat they will remain as 0.03, but the export dialogue will change this into 0 and T in the flag field, to fit with Climsoft needs.

The default export will certainly be a csv file. But we also want to be able to expert the Comments sheet, to be able to guide Climsoft staff on any special features of the data. The default might be to export that as a second csv file.

However, I would still like to consider having an option to export to Excel for some users. This can then be viewed in Excel and then saved from Excel as csv, to facilitate entry into Climsoft. That would enable the data and the comments to be exported together. Is there an extended csv that would also allow multiple sheets?

In the (slightly?) longer term I note that both mysql and mariadb have an Excel addin to facilitate transfer from Excel to these data bases. We find that many users in NMSs have some of their data in Excel. Perhaps Climsoft might wish to offer that option in the future?

Patowhiz commented 3 years ago

@rdstern I agree, if we approach this with the long term view of climsoft supporting importing from excel, it makes sense.

With @mhabimana help we could add that feature in the next climsoft release.

maxwellfundi commented 3 years ago

@Patowhiz Can we have you and someone work on this? Maaybe @Gioche6 with @Vitalis95? Is this a new dialog that they can get started on?

rdstern commented 3 years ago

@maxwellfundi this will be an important dialogue, but it is lower priority than the current issues related to the Describe menu. In general I am de-emphasising climatic as important just now, until we get more funding to continue with that work. If you want someone to work on a new and important dialogue, then the caret dialogue is very important and not as difficult as it may seem initially, because it is similar to an existing one. The define survey data is also important, but harder. I would give eash of these higher priority than the export.

Even higher priority, and easy, is under Describe > Three Variables > Pivot Table/Chart. @Ivanluv already has the 2 variable and the 3 variable should follow easily!

rdstern commented 2 years ago

@Patowhiz this was asked for, by KMD. I suggest it could be a good time to work on this dialogue now. But you have quite a lot on your plate, so I wonder if someone else can at least start on this one. I suggest the items above are all useful. I am still going with the idea of an export to Excel, so multiple sheets can be exported, including the comments. Then the Excel data sheet can be saved as csv for the current Climsoft?

@N-thony if Patrick is not to work on this dialogue now, then I wonder who? @Vitalis95 is interested in working on climatic stuff?

@Patowhiz and/or @Wycklife what would be useful is some simple examples of data that is in the shape to be imported easily into Climsoft. Possibly several examples including:

a) Single station, just rainfall data. Is it any more complicated if there are multiple stations, with rainfall data? b) Single (and/or multiple) station with flag information, so trace can be imported. c) Single or multiple stations with multiple elements - perhaps particularly rainfall, tmax and tmin. With and without any flag information.

Thanks.

N-thony commented 2 years ago

@Patowhiz thanks for the discussion, can you share your idea on this so that @Vitalis95 will start the work?

Patowhiz commented 2 years ago

@N-thony let me look into this then get back with concrete answers that @Vitalis95 can use.

Patowhiz commented 2 years ago

@rdstern in climsoft we can import;

  1. Hourly or daily data - station id and element code should be in a column. And they can be multiple. The station id and element code should exist in the database. If they are not in the data file, then you can still choose them while importing. The observation date time can be in the form of a datetime column or separate columns of year, month, day and hour. If the hour column is missing you can choose an hour when importing.
  2. Monthly - similar to hourly or daily data. Difference is, Climsoft will append the period(days in the month) while saving the data.
  3. Multiple Element Columns - Data file that has the elements as columns. These columns will just be match by the user to the respective elements in Climsoft. The observation date time must be in the form of separate columns of year, month, day and hour. If the hour column is missing you can choose an hour when importing.

Currently Climsoft cannot import a flag. So such a column will just be ignored.

Below are sample files of the form that I suggest we initially produce. We can also produce them using our existing dialogs.

I've leaned towards date being in separate columns because we don't quite cope well with date time columns in R-Instat. CLIMSOFT IMPORT FORMAT.zip

Patowhiz commented 2 years ago

Here is the same sample file in a single rds file. This has another sample that has the level column that Climsoft can ingest (especially for upper air data). climsoft_sample_file_formats.zip

rdstern commented 2 years ago

@Patowhiz thanks - great. With rainfall, how do we organise trace values for import into Climsoft?

Patowhiz commented 2 years ago

@rdstern the value will have to to be with T flag, for instance 0T. Then climsoft will separate the value and flag during import.

Vitalis95 commented 2 years ago

@rdstern @Patowhiz, this is the sketch of the dialog. Have a look at it and if there is anything I still need to add before I start implementing .

WhatsApp Image 2022-06-14 at 2 43 32 PM

N-thony commented 2 years ago

@rdstern @Patowhiz, this is the sketch of the dialog. Have a look at it and if there is anything I still need to add before I start implementing .

WhatsApp Image 2022-06-14 at 2 43 32 PM

@Vitalis95 good start, @rdstern what do you think?

Patowhiz commented 2 years ago

@Vitalis95 this looks good so far.

I suggest you remove the date control, until when we support date time objects well.

Then also add an hour input text box, this will be for setting the default hour that will go to the hour column.

The station id will always be optional. If the elements are not stacked, then we have to unstack them with their values. If unstacked we don't need to. I'm not sure how you can adopt this functionality in your design.

You could also add an input text box for the level, default should be surface.

In R-Instat we like analyzing data that is stacked. For climsoft I'm recommending producing data that is stacked by Station Id and elements in an unstacked format. Have a look at the rds file that I have provided above.

I'm happy to discuss this online.

Thanks.

Vitalis95 commented 2 years ago

@Patowhiz thanks, we can have a call on skype

rdstern commented 2 years ago

a) @Patowhiz and @Vitalis95 I would like to keep the date field, even though it isn't used directly. It is needed before data frames are defined as climatic and I strongly suggest this dialogue could usefully be for defined climatic data.

b) I agree we need an hour field. I assume this could be a textbox, for a single value. But I suggest it could also take a variable, so could then be used to export hourly or other synoptic data.

c) When defining data as climatic, could we also add the Climsoft code, e.g. 5 for rain into the metadata. That code will then be used in the export.

d) When we export to Excel, can we add an option to include the comments sheet into the export?

Patowhiz commented 2 years ago

@rdstern I totally agree with your item (b)

Adding and working with climsoft element codes is what I'm trying to avoid, they are generally not standard and may differ in different Met services. To me that's an unnecessary extra step to users. In the output format that I have suggested, climsoft will always force you to match them.

I'm happy to retain the date, but then I suggest removing the year, month and day receivers. The dialog should then produce those columns from the date column. Then once we are able to support date time we can also remove the hour input.

We still don't support excel import in climsoft, and the discussions on how to store such comments has note been agreed on yet. So I suggest initially, this dialog to strictly assume the export will be as csv.

I'm also still not sure if this dialog should be an export dialog or just a reshape dialog?

rdstern commented 2 years ago

@Patowhiz I would also be happy if this were to be a reshape dialogue, rather than an export dialogue. That would also solve my problem of wanting to propose "good practice", while also keeping the system as simple as possible for the user.

You see even if Climsoft can only import csv, I would like to encourage users to look at the data first, and this would be simpler if it were exported in Excel, of if it produced a new data frame. Then we use the ordinary File > Export.

So, now, after your suggestion above I propose either we just produce a new data frame, or we also have an optional export (with a checkbox, default unchecked) to export csv as well.

Patowhiz commented 2 years ago

Agreed. We can initially just have it produce a new data frame. The later @Vitalis95 can add the option of export. Then @rdstern you may have to now propose a new name for the dialog and it's menu item label.

N-thony commented 2 years ago

Agreed. We can initially just have it produce a new data frame. The later @Vitalis95 can add the option of export. Then @rdstern you may have to now propose a new name for the dialog and it's menu item label.

@Vitalis95 how is it going with this?

Vitalis95 commented 2 years ago

Agreed. We can initially just have it produce a new data frame. The later @Vitalis95 can add the option of export. Then @rdstern you may have to now propose a new name for the dialog and it's menu item label.

@Vitalis95 how is it going with this?

@N-thony , I have started adding controls to the dialog. I am suppose to have a call with @rdstern or @Patowhiz so that I will know the functions to use. I have texted them on skype

N-thony commented 2 years ago

Agreed. We can initially just have it produce a new data frame. The later @Vitalis95 can add the option of export. Then @rdstern you may have to now propose a new name for the dialog and it's menu item label.

@Vitalis95 how is it going with this?

@N-thony , I have started adding controls to the dialog. I am suppose to have a call with @rdstern or @Patowhiz so that I will know the functions to use. I have texted them on skype

@Vitalis95 how is it going here?

Vitalis95 commented 2 years ago

Agreed. We can initially just have it produce a new data frame. The later @Vitalis95 can add the option of export. Then @rdstern you may have to now propose a new name for the dialog and it's menu item label.

@Vitalis95 how is it going with this?

@N-thony , I have started adding controls to the dialog. I am suppose to have a call with @rdstern or @Patowhiz so that I will know the functions to use. I have texted them on skype

@Vitalis95 how is it going here?

@N-thony , I talked with @rdstern and he was to comment on this. The comment will guide me on implementing the dialog

N-thony commented 2 years ago

Agreed. We can initially just have it produce a new data frame. The later @Vitalis95 can add the option of export. Then @rdstern you may have to now propose a new name for the dialog and it's menu item label.

@Vitalis95 how is it going with this?

@N-thony , I have started adding controls to the dialog. I am suppose to have a call with @rdstern or @Patowhiz so that I will know the functions to use. I have texted them on skype

@Vitalis95 how is it going here?

@N-thony , I talked with @rdstern and he was to comment on this. The comment will guide me on implementing the dialog

@rdstern could you comment on this so that @Vitalis95 moves to the next step? Thanks.

rdstern commented 2 years ago

@N-thony after discussion with @Patowhiz I assumed the dialogue should not actually duplicate the File > Export dialogue, but just prepare a data frame that would be in the right shape for the export. @Vitalis95 asked about examples that produce a new data frame and I have suggested that the code would presumably be similar to that used by any of the dialogues in the Prepare > Data Reshape menu, all of which produce a new data frame. This includes the Transpose dialogue that @Vitalis95 worked on.

N-thony commented 2 years ago

@N-thony after discussion with @Patowhiz I assumed the dialogue should not actually duplicate the File > Export dialogue, but just prepare a data frame that would be in the right shape for the export. @Vitalis95 asked about examples that produce a new data frame and I have suggested that the code would presumably be similar to that used by any of the dialogues in the Prepare > Data Reshape menu, all of which produce a new data frame. This includes the Transpose dialogue that @Vitalis95 worked on.

@Vitalis95 any progress? This has been here for long.

N-thony commented 1 year ago

@Vitalis95 can you check with @Patowhiz about this issue?

rdstern commented 1 year ago

@Vitalis95 we have workshops that could use this dialogue, so I hope it can get into the next upgrade

rdstern commented 1 year ago

@Vitalis95 I discussed with Danny, why I wanted the dialogue to just produce a data frame and not do the export directly. He rejected my arguments - and I am afraid correctly. So we will need the dialogue to be able to export. And I am now adding more as I write this too.

However, this may take longer than the next version, so we may temporarily include a version that just produces the data frame if there isn't time to do everything

a) So the title you had at the top of the dialogue to export is correct.
b) I argued that one reason for the 2-stage process was that we may want also to export the Comments sheet. This is always called .comments. Danny feels that's still ok to export. So we need a checkbox Export Comments Dataframe also. Default is unchecked. If checked it only exports it, if there is one. Otherwise it ignores the setting of the checkbox. c) We need the New Data Frame Name to have a checkbox.
d) Then there is an Export Data Frame(s) with a checkbox.
e) At least one of these must be checked. The may both be. f) So, for now, you could just have the New Data frame Name checked and the other is disabled. g) I argued that we may want to export either as csv, or as Excel. Danny argues that's fine and it could be either. I am not sure what happens when you click Ok on the export? Do we need to add the file names into the dialogue, or does it call the ordinary export dialogue. Maybe @Patowhiz can advise. h) That made me wonder about exporting other information. For example, how should we provide the station information if we have a station file in R-Instat. Could we add that? Is it worth it? Another question for @Patowhiz. And useful, because you can't import data for a new station until it has the Station details? i) If so, then (later?) perhaps we need the top radio buttons. That could have Station, then Daily. Then perhaps others as well, particularly Hourly - or maybe Within Day. That would export perhaps hourly data, which I assume goes to another form? And 3-hour data, etc. What are the other time scales that we might have?

i)

Vitalis95 commented 1 year ago

@Patowhiz , what is you take on g & h above?

Patowhiz commented 1 year ago

Climsoft only imports csv files at the moment. So no need for file type option. It can be any file name. Yes, Climsoft can import station metadata. But I don't think we necessarily need that now. Daily and hourly data are all imported by the same dialog in Climsoft. In Climsoft we use the period field to distinguish the 2. Please not the date column should have the correct hour of observation included.

Vitalis95 commented 1 year ago

@Patowhiz , thanks Since Climsoft does csv for now, what of the comments sheet because it doesn't support workbooks that contains multiple sheets How can we export the comments sheet?

Patowhiz commented 1 year ago

I think that can be exported in any format, it won't be imported into Climsoft.

rdstern commented 1 year ago

Aha @Vitalis95 when I use @Patowhiz export data I can export multiple files to csv. So I wonder if the export option in your dialogue just calls that dialogue, but without the option to change the file-type from csv. The csv is anyway the default, so perhaps it can just call the dialogue as it is. Then you have to specify the directory and then you return to the dialogue to click ok. Perhaps in the export you return to your export and then have to click ok from there.
And "finishing" that dialogue might be a good task for the forthcoming sprint.

rdstern commented 1 year ago

@Vitalis95 could you check whether the current export works for sub-daily data? From the R library openair has mydata and that is hourly. Maybe you could try with that?

Patowhiz commented 1 year ago

@rdstern @Vitalis95 I'm keen to have this dialog as simple as possible. Climsoft users will primarily export data in 2 scenarios;

  1. To import it as new data in Climsoft. Columns expected are ; station_id, element_id, level, datetime, value, flag.
  2. To import it as quality controlled data. Columns expected are ; station_id, element_id, datetime, value, flag, qc_log.

Regardless of the temporal nature of the data, Climsoft always stores the data as a date time instance. We recommend users to use different element ids for the same "element type" to differentiate the temporal difference.

I suggest having this dialog as a reshape or transform dialog that caters for the 2 scenarios above. In Climsoft we now have the 2 dialogs that can be used to import the "kinds" of data. Then just reuse the export dialog for exporting data.