Closed lemmy closed 3 years ago
It would not be a problem for us to ping the INRIA server. However, the users might get worried that we are spying on them. Would it be more reasonable to have a joint announcements list, so people would be asked to subscribe to it when they use TLC, Apalache, the VScode extension, etc?
The Toolbox prompts users to explicitly opt-in:
Right. That makes sense in the Toolbox. We probably can add an opt-in option in the command-line. But I doubt anybody would trigger it :-)
Prompting users to opt-in is the task of IDEs such as the Toolbox, VSCode, .... Like TLC, Apalache would simply report stats iff ~/.tlaplus/esc.txt
exists and has an opt-in marker:
How about the following. We isolate all the statistics collection in a well-defined script called bin/opt-in-report-stat
. The model checker script bin/apalache-mc
warns the user about the option of turning on the statistics collection. We add two commands to apalache-mc
, e.g., --feedback-allow
, --feedback-shutup
. The command --feedback-allow
writes REPORT_FEEDBACK=1
in ~/.apalache/feedback.rc
, whereas the command --feedback-shutup
write REPORT_FEEDBACK=0
in ~/.apalache/feedback.rc
. The script bin/opt-in-report-stat
uses this configuration file.
@shonfeder , what is your opinion?
We will also have to add a page explaining what we collect and how we do that. The large part is already written by @lemmy: https://github.com/tlaplus/tlaplus/blob/master/tlatools/org.lamport.tlatools/src/util/ExecutionStatisticsCollector.md
The approach sounds OK to me, but I don't know anything about this stuff and I'm not sure how the script will work to monitor the stats we want to collect. I've never implemented or facilitated any end-user telemetry before.
Wrapping stuff in additional scripts doesn't seem ideal to me, tho I get that this is for the sake of isolating the ~spying~ telemetry code.
Having a dedicated file for setting one flag seems non-ideal. Should we just have a more general configuration file with this as our first option?
edited to correcter terminology
What is the reason to re-write execution statistics/telemetry data collection? I created the issue, assuming that you would re-use the existing ExecutionStatisticsCollector and we change the wording in the Toolboxes (Eclipse & VSCode) to include Apalache and TLC. If you develop your own telemetry implementation, it is probably best to also separate the backend.
Btw. I very much value my privacy, but I think the term "spying" shouldn't be used in this context. It is telemetry data, users have to explicitly opt-in, and the collected data is publicly shared--quite different to how spies operate.
I don't think we have to invent our own telemetry. Just call yours in a different instance of java inside the script.
IMO this leads to an infrastructure question: Do you expect the Toolboxes to launch Apalache via the apalache-mc
script in the long run?
I don't think so. We have the class called Tool
that can be called from JVM. However, Toolbox gathers the statistics itself, right?
No, the Toolbox (Eclipse & VScode) simply creates the file ~/.tlaplus/esc.txt
and leave the actual reporting to the tool. The reason being that the Toolbox doesn't have access to all of the telemetry data.
I will check, whether we can quickly plug-in telemetry and add the opt-in option.
See #415
@lemmy the opt-in message in Toolbox is talking about TLC. Could you also add a line that this opt-in works for Apalache (and probably other TLA+ tools) too? Otherwise, it may be seen as an unexpected behavior.
It's perhaps best to replace TLC
with TLA+ Tools
.
Are all other items of the list going to apply?
Total number of cores and cores assigned to TLC Heap and off-heap memory allocated to TLC TLC version (git commit SHA) If breadth-first search, depth-first search or simulation mode is active TLC implemenation for the sets of seen and unseen states If TLC has been launched from an IDE Name, version, and architecture of the OS Vendor, version, and architecture of JVM The current date and time An installation ID which allows to group execution statistic (optionally)
So far, here is what I managed to report:
We only enable/disable statistics in the apalache command line. There is no choice between ID and no ID. But if the ID is disabled in Toolbox, your code will respect it.
This one is done.
So far, I don't see Apalache stats in the data.
There should be a few entries when I tried to run it on my computer. Apart from that, it is not released yet.
It is showing up here: https://exec-stats.tlapl.us/
Can it be a parsing problem? E.g., our version string is longer than that of TLC.
To keep the backend simple, can you please also report:
result.put("osVersion", System.getProperty("os.version"));
result.put("jvmOffHeapMem", "0");
Added. Just have to wait until PR is built.
There is at least one entry at exec-stats now and I see the apalache entry!
Closing the issue.
- do you want to know, if Apalache is run in an IDE?
I and others have been setting ide
to e.g. "github" when TLC runs as part of automation.
Why would you collect stats for the github runs?
Why not? It tells if users run our tools as part of automation and we can easily exclude the datapoints via ide
in other queries. For example, I wouldn't be surprised if automation correlates with beefier machines.
@konnov asked during the TLA+ Community Event about worldwide TLA+ user numbers. TLC happens to collect execution statistics that give us a rough approximate. Does Apalache want to integrate it execution statistics collection as well? I'm happy to share all the technical details if sending statistics to INRIA is not a no-go. What's collected (for TLC) is listed at https://github.com/tlaplus/tlaplus/blob/master/tlatools/org.lamport.tlatools/src/util/ExecutionStatisticsCollector.md and the data is publicly shared.
slightly related: https://github.com/alygin/vscode-tlaplus/issues/114