Open hhpmmd opened 3 years ago
You are absolutely right that this is a weakness of the app as it stands. I would like to broaden the set of data attributes that it is possible to graph. Differentials and integrals are a good start, but there are also things like tracking frequency or number of data points tracked (which could be useful for naturally un-ordered data like: 0:Oranges, 1:Apples, 2:Pears etc..), and then there's the time of day something's tracked or the time between tracking. I'm not saying you need to implement these but what ever solution is used here should scale well for other data attributes.
However I think I would prefer that this was not added to the tracking side because:
Since all of these transformations can be computed on the fly without needing to change the database structure at all, we should be able to put these options almost anywhere.
My first idea was to add this option to the graph side. So when a user selects which data set they want to graph, instead of bringing up a big list of data sets we bring up a UI that lets you select both a data set and a data attribute. But over time I have gone off this idea. I think this approach creates some level of technical debt because ultimately there are lots of similar problems I would like to solve. Fundamentally there are lots of different ways that people want their data to be transformed before it is graphed so combining the transformation and graphing stages begins to get exponentially more complicated as you try to support more and more users requirements. How do you combine totals and averaging and differentials and offsets and scalars? What happens when someone wants an option to tweak the order of operations? Trying to support too many things in the graphing interface will require countless new check-boxes and drop-downs that would just make the app unusably messy for most users.
My current best idea is more involved but it is something I have been thinking about for a while which has the potential to solve all sorts of problems at once. Here are some examples of people asking for new ways to transform/combine their data:
https://github.com/SamAmco/track-and-graph/issues/7 https://github.com/SamAmco/track-and-graph/issues/49 https://github.com/SamAmco/track-and-graph/issues/61
I have many more in emails. So for example a request I get a lot is that users want to be able to see data combined from various sources. e.g. If I track wine, beer and whiskey separately how do I draw a graph of alcohol in total. What needs to happen is that data transformation should be separate from visualisation (so technically it should be Track&Transform&Graph). So what I would really like to do is to add another top level menu screen called "Functions" (alongside Home, Reminders, Notes, etc.. ). The functions screen would again be a list view with a plus button in the top right which would create a new function. The function would take a name (the name of the output data) and a text input that represents a transformation of one or many input data sets. I haven't thought through how this would work yet but it could allow you to create entries something like:
Alcohol = daily_sum((whisked 3), (wine 2), beer) Weight changes = differential(weight) Weight changes weekly = differential(total(weight, WEEKLY))
etc. etc.
I'm sure you can appreciate the power here but a few of the advantages I see are:
You could then select any of the data created by a function in the data set drop down when creating a graph/visualisation. So each function must simply output 2 dimensional data with a value and a timestamp as if it were any other tracked data.
This approach is obviously much more involved and requires the development of some composable expandable language interpreter. I have never personally written anything like this so I'm not entirely sure how much work would be required. On the plus side it's much more powerful and kills many birds with one stone.
One final note is that either way the documentation will need updating. There is an FAQ in the app that a lot of new users find helpful so it needs to be kept up to date and it must document well exactly how everything works.
Perhaps this has put you off but I was also hoping maybe it would inspire you :P .. I have no real time for this project now unfortunately so I will not be implementing this any time soon, but if you're interested I will support you in any way I can.
Yeah, I also had a basically very similar idea of having these programmable trackers and I think you are right in the conclusion that in the end this might be the only logical next step in regards to these issues.
I did some minimal research and found that having an interpreter in an android app has been done before, so it would be possible to do and I also think writing the interpreter itself (parsing and computing the input from the user) is an interesting challenge that I could probably motivate myself to do.
However I think there is probably also a lot of UI and some internal stuff to be done which a) I'm not sure how much work it is, b) might be harder for me to get into/I need more understanding of the underlying architecture and c) I'm historically less motivated to do.
Can you give an estimate of how much work you think adding the new UI stuff / integrating everything into the system might be? And optimally if you could see yourself doing some ui/integration work in the near future, if I were to implement an interpreter?
Well if you have the motivation that would be awesome. I think it really comes down to the specifics of the proposal.
Wrt integration: if all we have is a new table in the db for functions that just contains ID, name and functionText (or something simple like this) and then from the graphing side we access the data the exact same way as we do any other feature (i.e. there is a layer of abstraction that returns the data for any feature or function) then most of the work should be in defining/writing/documenting the language/parser. I can try and give you pointers on where to look for stuff architecture wise.
I'm sure the UI wouldn't be too much trouble for me to do at that point. I can only really find a few hours in a week for this but even still I envision UI to be quite minimal.
I would be interested in seeing what you've found re interpreters in other apps?
Okay so I found this blog post ( https://tomassetti.me/jariko-an-rpg-interpreter-in-kotlin/ ) which uses https://www.antlr.org/ which appears to be a tool that can parse 'any' grammar you design. Since the one needed here should be pretty simple function name and arguments in brackets for most if not everything) there probably is already some grammar which only needs minor changes. Since it's so general this would be my first thing to try to make work.
There are also some projects in which the grammars are hard coded:
So I think the blogpost and the first project are two good starting points.
I haven't checked (and it isn't really my expertise) regarding the licenses and if they are compatible with this project. It would be nice if you could give me a heads up whether either source is ok to use.
Nice work, I will try and take a look at them all soon and get back to you.
Antlr looks like it's probably a good idea. The license: https://github.com/antlr/antlr4/blob/master/LICENSE.txt is pretty permissive so that shouldn't be an issue as long as we include it in the project.
Before we get into implementation I think we need to find good examples of simple data transformation languages to draw on. We want something very simple and elegant that most users can learn easily but with enough flexibility to allow us to expand on it. I will try and think about some of the requirements and desired functions and collect a list in this thread probably this weekend. After that I will try to collect some good examples of simple languages/grammars that might be best in this thread also.
First I would like to define the data we expect to be working with in any/all functions. All data sets are a list of data points ordered by time from oldest to newest where a data point is an object containing:
Any input or output data set will be either:
(Note that no data can be considered regular unless it has been transformed by a function that declares its output as regular. Note also that regular data must be associated with a period and that not all functions can take multiple operands that are regular but with different periods.)
In addition any input/output data will have one of the following types:
I will list here some of the most commonly requested data transformations or tools to fascilitate users common requests in the most versatile way:
Input- data: [Irregular|Regular]&[Numerical], n: Number
Output- [Irregular|Regular]&[Numerical]
Description- The same operation applied to each data point e.g. adding 1 to all data points
Input- data: [Irregular|Regular]&[Time], t: TimeUnit
Output- [Irregular|Regular]&[Numerical]
Description- Returns the number of time units each data point represents e.g. if the data point has value 00:03:00 and the time unit t represents minutes then the output would be 3
Input- data: [Irregular|Regular]&[Numerical], t: TimeUnit
Output- [Irregular|Regular]&[Time]
Description- Returns amount of time each data point represents e.g. if the data point has value 3 and the time unit t represents minutes then the output would be 00:03:00
Input- data: [Irregular]&[Time|Numerical], p: Period
Output- [Regular<with period p>]&[Time|Numerical]
Description- Calculate the total of all data points per period p and return a regular form of the input data with period p.
Input- data: [Irregular|Regular]&[Time|Numerical], p: Period
Output- [Regular|Irregular]&[Time|Numerical]
Description- For each data point calculate the average of it and all data points prior to it that fall within the time period p.
Input- data: [Irregular]&[Time|Numerical|Text], from: Optional<List<[Time|Numerical|Text]>>, to: Optional<List<[Time|Numerical|Text]>>
Output- [Irregular]&[Time]
Description- For every data point output a data point that represents the time since the last data point tracked. If from and to are not defined we simply find the time between each pair of data points. However the function can take two lists of operands (from and to) that allow you to determine the time between given values e.g. for every data point with label "lunch" or "dinner" get the time since the last data point marked "breakfast" or "lunch". If from is defined but to is not then you calculate for each data point that matches a value in from the time since the last data point of any value. If to is defined but not from then you calculate for each data point the time since the last data point that matches any value in to.
Input- data: [Irregular|Regular]&[Time|Numerical]
Output- [Irregular|Regular]&[Time|Numerical]
Description- For every data point get the difference in value between it and the last data point tracked
Input- data: [Irregular|Regular]&[Time|Numerical], p: Optional<Period>
Output- [Irregular|Regular]&[Time|Numerical]
Description- For every data point output a data point that represents the accumulated sum of this data point and all data points prior within the given time period p. If no period p is given then the period is regarded as infinite.
Input- data1: [Regular<p1>]&[t1: Numerical|Time], data2: [Regular<p2>]&[t2: Numerical|Time] where t1==t2 and p1==p2
Output- [Regular<p1>][t1]
Description- For each data point a in data1, find the data point b in data2 with the same time stamp and output a data point that is the result of the operation on a and b.
Input- data: [Regular|Irregular]&[Time|Numerical|Text], List<[Time|Numerical|Text]>
Output- data: [Irregular]&[Time|Numerical|Text]
Description- Given a list of data points return a list containing all the data points that match any value in the given filter list.
Input- data: [Regular|Irregular]&[Time|Numerical|Text], List<[Time|Numerical|Text]>
Output- data: [Irregular]&[Time|Numerical|Text]
Description- Given a list of data points return a list containing all the data points that don't match any value in the given filter list.
Data points may have some metadata. Right now I would say that is just the notes field but it is possible that this could be expanded on in the future, for example users have asked to add locations and images to their data points. We may need functions or grammar to allow us to define what specific information about a data set we are interested in when we pass it to a function. For example if the function can take text input are we interested in the multiple choice labels or the notes. Furthermore we want to make sure that we proliferate any metadata to the best of our ability through any function. For example if you just add 1 to all the data points then keeping the notes isn't an issue, but if you calculate daily totals it becomes more difficult. I don't think we need to confront this question at this stage though.
Right now the only function I have defined that converts irregular data to regular data is the Periodic Total. There are probably other desirable functions to convert irregular data to regular data like "last value per period" or "most common value per period" or "mean value per period"
This is obviously written in psuedo code that I made up as I went along so please let me know if this is not clear enough. It would be good to align with you on these functions. Are there any I have missed, are there any that can be broken down into better fundamental functions? I will await your thoughts on this.
Here are some of my thoughts in basically random order:
data
like you defined it and then a condition, which is a function that takes a single data point and returns a boolean. We could offer factories for these functions like filter(my_data, timeIsBetween("Friday", "17:00", "Sunday", "23:00") )
where timeIsBetween
could return such a filter function. aggregate(my_data, Week, median)
and have functions for median, average, sum, min, max, latest, earliest and so on.union
operation missing, for when i want to join reading_book_a
and reading_book_b
into reading
These are all good ideas. I will try to get back to you in more depth soon.
In terms of finding other languages to draw on, I'm not sure that what we are trying to achieve here is quite close enough to any existing language to warrant using it. Some of our constraints are as follows:
With these things in mind I have one idea. Suppose a function was composed of lines of the form:
<variable_name> = Function( .variable1, .variable2, CONSTANT)
Where the .
symbol is used to reference a variable which may be a variable previously declared in this function or a data set. Constants (like WEEK or MONTH for example) are in caps. My hope here is that we can avoid any use of nested functions. The last line of the function could be simply of the form:
Function( .variable3, ... )
or
.variable3
But in any case the last line represents the final output of the function.
The .
and ,
keys are normally easy to reach on a mobile keyboard which makes this form of referencing easier on a mobile. I also think there will be some UI work to do. For example it is probably necessary to suggest a list of available data sets when the user types a .
and I think some syntax highlighting would go a long way. We will also need good error reporting etc which means the interpreter will need a good way of reporting back to the caller.
Some examples:
Union(.reading_book_a, .reading_book_b)
total_per_month = PeriodicTotal(.distance_run_per_day, MONTHLY)
Delta(.total_per_month)
filter_lunches = valueIs("Lunch")
filter_free_lunches = valueIs("Free lunch")
lunches = Filter(.meals, .filter_lunches)
free_lunches = Filter(.meals, .filter_free_lunches)
lunches_daily = PeriodicTotal(.lunches, DAILY)
free_lunches_daily = PeriodicTotal(.free_lunces, DAILY)
free_lunch_ratio_daily = Divide(.free_lunches, .lunches)
.free_lunch_ratio_daily
And so on. Does that make sense? Do you see any issues with this syntax?
Regarding the use of a custom language: I mean the language you describe is very simple and probably not that difficult to port to antlr, the other approach would be to take a language with way more features and just not handle the things we don't want.
So, just for example, if we ignore the .
notation and just pass datasets by name only, I don't see why we can't use (again for example) a python parser/lexer and just use a subset of their functionality and report errors if the entered code is outside that subset. We don't even have to tell the users that we are using the python parser/lexer so they don't get false expectations, and we don't have to write our own grammar in the end. However it will make extensions to the language very easy / very hard regarding if they can be parsed by the (e.g.) python parser or not. I'll probably do some experiments regarding this during the weekend.
so the dots represent references, but i'm not sure if there are cases where we pass un-referenced data to functions. it is nice for the autocompletion though.
How do we deal with division by zero? Nan/infinity? How do we plot such values?
datasets don't have a fixed name right now. I guess we would have a UI element mapping datasets to variable names above the editor or something like that?
.
for example then as soon as the user types the .
we can present them with a list of features to select from and probably even behind the scenes we can use the feature ID in the functions text rather than its name (and just display its name to the user). Does that make sense?Hi there,
I just discovered track-and-graph and I like it very much! However, I quickly looked for a way of processing the data before graphing it and I ended up here. I think that would be a great enhancement of this (already great) app.
I just wanted to suggest an alternative approach to this problem. Instead of a textual language like suggested above, it could be possible to use a graphical language in the spirit of scratch or puredata.
So something like
Union(.reading_book_a, .reading_book_b)
could look like
And constants like MONTHLY above could simply become a dropdown menu of the PeriodicTotal box.
I'm a big fan of textual languages and usually don't like graphical ones... on the desktop. However, typing code on a mobile device is really cumbersome.
In addition to save much typing, such an approach would avoid syntax errors and allow to prevent most of possible errors in formulas (only allow to connect boxes when that makes sense). It might also be a bit easier for non-programmers to use. And some graphical languages have boxes with several outputs to solve the "what specific information about a data set we are interested in when we pass it to a function" question.
It would probably require a little bit more code to make it work, though, although there might be libraries out there that could provide a significant part of what's needed.
Anyway, I don't have the time or the skills to implement it myself, so that's just a suggestion. I would appreciate a data transformation language, whatever form it takes. But as there is a discussion about the best way to dot it, I thought I might add my 2¢...
Hi @amiguet .. I appreciate that this could be a superior user experience however it's simply not really feasible with the time I have to dedicate to this right now. The advantage of a language is that it can be developed, iterated and modified much more quickly. I suspect it's likely that most people won't need to set up too many functions so it probably won't be too much of a burden to the user. In any case I don't really have time to work on this at all, I am relying on @hhpmmd who said above that they are less comfortable with the UI side of things, so a UI heavy solution is not likely for now.
I completely understand. If I was to develop such a feature, I would definitely select the language version for the same reasons. I just wanted to make sure that this is a conscious choice and not a default solution. Anyway, many thanks to @hhpmmd to consider implementing this, that would make a great feature!
I agree with the points mentioned. I am vaguely familiar with scratch and looked at some images of pure data. To me they look like they would be hard to manage without a mouse. @amiguet are you aware of any graphical programming interfaces that are designed for mobile/phone usage? I guess you could do some recursive UI where there is only ever one node displayed and you ways to navigate trough the node tree. Anyway in the end I feel like it's not really an either or, but rather you would have a textual language and then a graphical one that builds upon the textual one (or at lease the grammatical structure of it). I also hope that it's possible to do a lot with suggestions / auto-complete to guide the user.
As an update from my side: I did some experiments and I think it is definitely necessary to do a custom antlr grammar, but that also seems to be a lot easier than I thought. I'm also busier than I thought atm but I should be able to finish this by mid april. I also found this project https://github.com/massivemadness/Brackeys-IDE which supplies an editor with suggestions and highlighting etc. The info from the readme looks good, haven't tested it yet though.
You make a great point that we would need a language anyway under the hood. Brackeys IDE looks like an excellent find. Sounds like you're doing good work, don't stress too much about how much time you have, it will probably take me a while to get the UI and docs side done too. If you want to develop this together incrementally then what probably makes the most sense is for me to create a branch for this feature and we both create PR's for that branch and review/merge when we have time. I think we probably want to break the work up into chunks but also don't want to have long running development on master for this.
Let me know when you're ready for that, but again no rush.
Yeah, having a separate branch on which we both work on sounds like the best way to move forward. When you find the time it would probably make sense for this branch to have a different starting activity with basically just a text input and output field which then allows for some easy prototyping in regards to adding more UI elements and so on. I guess that would be the way with the least upfront UI work.
Ok I will take a look at some point.
Hi @SamAmco uhm :point_right: :point_left: if you were to create a branch called feature/data_transformation, i think i have a some code I would like to push on there :flushed:
Hey :) created that branch for you
Does this issue concern with data operations like
Plotting: Tracker1 - tracker2
Or is it something else?
I was wondering if we could plot basic operations between the variables (trackers) in a plot.
@dummifiedme yes that is the idea
Is this feature in development currently?
Not just yet. Honestly I want to do it but I want to do it well. There's a huge amount to consider here, it's a very complex feature. Currently I'm working on other features and bug fixes trying to get the basics good before working on this feature as I expect most people won't even use it. When I do implement it though I'm actually thinking that going with a UI driven approach rather than a text driven one might be better. Since I'm now using Compose for UI I think this might be do-able.
I expect this feature is at least a year if not more away so please don't hold your breath. If you want more powerful analytic abilities in the meantime you might want to set up a system where you export a CSV to a google drive and have a spreadsheet that runs statistics for example. Depending on how technically capable you are there's really nothing stopping you. When you do a backup you are just exporting a sqlite database. You can write sql queries around that data or get clever with python and scikit learn or even go full on machine learning with pytorch or something.
For these reasons I would say something as simple as automatic backup is a higher priority than this. There are many features which I don't feel I can let people wait for while I work on this, but I will get there eventually.
Consider the following two scenarios:
Mathematically speaking sometimes you want the derivative (1) and sometimes the anti derivative (2) (i hope i use those right).
Solution: Computing the data should be easy. For (1) subtract the previous tracked value from the new one, and for (2) add a sum of the previously tracked values to each value. Since the data can be generated on the fly from the original data, no additional data has to be saved in a database and there should be no backward compatibility issues.
There are different ways to have a user enable this data. I thought about having a check mark for numerical / time tracker:
Checking a mark would then lead to an additional entry when selecting data to plot, similar to how time-duration data has multiple entries for time-duration, hours, minutes and seconds.
I'm open to feedback, especially when it comes to wording (i'm not a native english speaker).
I think i can probably implement this myself, though maybe some hints as to where I would likely have to change stuff would probably speed things up.