-
**Regarding synthetic datasets:** from the implementation and as it explained in [the issue](https://github.com/HazyResearch/safari/issues/35), _train loss is evaluated on all tokens and test is only …
-
### Environment Details
* SDGym version: 0.8.0
* Python version: 3.11.5
* Operating System: Windows 11
### Error Description
When running the same code as #321 , the following error was enc…
T0217 updated
1 month ago
-
A while back, Lucene changed the way that it encodes doc IDs from PFOR-delta to FOR-delta, which is a bit faster but less space-efficient. In order to avoid introducing space-efficiency regressions (e…
-
200k synthetically generated DuckDB SQL queries, based on the DuckDB v.0.9.2 documentation
can you share how to generate the 200k datasets?
i want to regenerate some data for my own datasets
-
The package has detailed documentation, but the paper is quite difficult to read for a non-specialist audience, mainly in the second part. Please add example use cases of the package for researchers i…
-
Hello,
I could not find some of the datasets that you load in your code.
In particular, I'm looking at your synthetic datasets (I didn't check if there were issues with the RWTS).
In [result-…
-
Great work!
I commented all the push_to_hub in the code. Is synthetic_data_llama-3-8b-instruct-sppo-iter3_score dataset generated by PairRM?
[rank4]: Traceback (most recent call last):
[rank4]:…
-
Dear Author,
Thank you for your excellent work!
I am interested in applying your denoising algorithm to my own real images.
Could you please provide guidance on how to achieve this?
-
I am trying to use the code provided in MOVi challenge to generate synthetic datasets by myself.
But I am kind of confused about the format of intrinsic matrix of the generated data, because of the u…
-
Would be nice to have one simple dataset used as a default-- or two, with different properties (for instance the size ?). The goal would be to exemplify the data needed for new case studies and to con…