Closed gabrielareto closed 1 year ago
errors when loading again, as input, the dataset and profile just returned by the ap
Did they use the "App's profile" in Headers and Units instead of their profile (as returned by the app after the first interaction with it)?
I did not record that.
Shouldn't it work both ways?
no. The user's profile is mapping user data to standard data. The app's profile is mapping standard data to standard data. (no conversion).
If the user inputs standard data and use is own profile, the app will look for the user's column (the original ones) in a file that does not have them and thus cause all kind of issues.
right, thanks!
It is a bit confusing that the output is [data in format 2] and [profile that describes format 1].
It is also a vulnerable point, because the user can forget (or lose) the format of their output.
I think it would be easier for the user if we work with pairs of {data in format X, description of format X}. For example if the output is structured into subfolders:
would this be a problem?
I will try to do that
I will try to do that
related to this idea of pairs of {data in format X, description of format X}:
it can be more clear for the user, to understand how the app works, if they declare the input profile and the output profile at the same time, not in different tabs. Something like:
1- load your data 2- profiles etc.
do you have a description of your input? Pick from the list, or provide a rds object that blah blah blah.
do you have a description of your desired output? Pick from the list, or provide a rds object that blah blah blah. This is typically a consensus format in collaborations, and may be provided to you by the person(s) responsible for data aggregation. It could also be the format of a network or repository in which you want to integrate your data.
3- headers and units. In case you want to create/update the description of the input data.
etc.
are there reasons why this could be more difficult for us or the users than the current design?
here is what the files are in the output now (for a case where I uploaded 3 tables to stack and merge, and gave them the name you see):
I think this helps, thanks. Can you try to pass through the app again using output_data.csv and outputProfile.rds as your new input? For example, as if you were to run the corrections on a second phase. It should always work, right?
I tried. The app works but it does show a big warning that the profile doesn't seem to apply to the data.
I lowered my error alert, so the error doesn't show unless the match between data and profile is really bad. And I added some info in the profile to know when the uploaded profile was originally the app's profile.
For more context, the app's profile has all possible entries filled out, but I only give the relevant columns in the processed data to the user, so when the user uploads that and uses the app's profile as input, there are a lot of columns "missing" compared to what that profile expects (other custom profiles would have most entries as "none"). I had a threshold of 20 columns missing, over which it could really be that the data is not in the app's profile format. But since I have been adding a lot of column options over time, I think 20 became too restrictive. I changed it to 50.
let me know if you get more feedback on this issue. I'll close it for now.
I understand, thanks.
I think it is ok as it is, as long as the warning just means that. The data federation will require proper merging, not just stacking. That is a good practice.
Some thoughts for the record:
Under the approach of having pairs of {data in format X, description of format X}, it would be desirable to have perfect match between data and its description. That would require subsetting the columns in the same way both in the data and in the profile.
Proper merging would work exactly the same, but a test of "is profile 1 = profile 2" would return FALSE. Is there any need for profiles that refer to different subsets of columns to be exactly the same? Can we foresee any step in a real data federation project in which the match between profiles (.rds objects) can play a role?
One alternative is to never delete empty columns, and return sparse datasets that (1) match perfectly the complete version of the profile and, therefore, (2) could be merged by simply stacking. This seems a more general solution. For example, teams could use the stacking step as a way to merge their datasets within the app, before running the corrections on the aggregated data. How important is saving space for us?
That would require subsetting the columns in the same way both in the data and in the profile.
I thought about that but profiles hold more than column names. They also have units info, date format info etc...
getting a whole bunch of empty columns is not pleasant.... I originally had that but got negative feedback on it so I removed them....
We could have it as an option so people can use it as you say... but in that case we would need to get rid of all the "_original" columns, which will be dataset specific...
ok, let's keep it as it is now. Thanks!
mentioned in #43, but an issue in itself:
people get app output get errors when loading again, as input, the dataset and profile just returned by the app. This means that the process of [running the app for reformatting] and then, in a second independent round, [running the app for data corrections] is not smooth.