TREX-CoE / trexio_tools

Set of tools for trexio files
BSD 3-Clause "New" or "Revised" License
18 stars 8 forks source link

Clarification on ~dim readonly~ #37

Closed addman2 closed 7 months ago

addman2 commented 7 months ago

I have a question about meaning of ~dim readonly~. Technically, when I specify a spare array I can have set the dimension of the array in two ways, as a normal ~dim~ or as a ~dim readonly~. How I understand the difference is dim specify the array dimensions and values within the range can be written, while the other does not assume anything about the data size and size is assumed from the larges coordinates in the data.

I made this small example. In the group of MOs I defined adata array of size ~dim~ data_x and ~dim~ data_y:

  int buffer_size = 10;
  int num_coo = 2;
  int *buffer_coo = (int *) malloc(buffer_size * num_coo * sizeof(int));
  double *buffer_value = (double *) malloc(buffer_size * sizeof(double));

  for (int i = 0; i < buffer_size; i++) {
    buffer_coo[i] = i;
    buffer_coo[i + buffer_size] = i;
    buffer_value[i] = i / 10.0;
  }

  rc = trexio_write_mo_data_x(f, 100);
  if (rc != TREXIO_SUCCESS) {
    fprintf(stderr, "Error: %s\n", trexio_string_of_error(rc));
    return -1;
  }

  rc = trexio_write_mo_data_y(f, 100);
  if (rc != TREXIO_SUCCESS) {
    fprintf(stderr, "Error: %s\n", trexio_string_of_error(rc));
    return -1;
  }

  rc = trexio_write_mo_data(f, 0, buffer_size, buffer_coo, buffer_value);
  if (rc != TREXIO_SUCCESS) {
    fprintf(stderr, "Error: %s\n", trexio_string_of_error(rc));
    return -1;
  }

  free(buffer_coo);
  free(buffer_data);

This example works perfectly fine. However, I would like to use ~dim readonly~. When I change data_x and data_y to this data type, the handling write functions as they suspected disappear from the API.

  int buffer_size = 10;
  int num_coo = 2;
  int *buffer_coo = (int *) malloc(buffer_size * num_coo * sizeof(int));
  double *buffer_value = (double *) malloc(buffer_size * sizeof(double));

  for (int i = 0; i < buffer_size; i++) {
    buffer_coo[i] = i;
    buffer_coo[i + buffer_size] = i;
    buffer_value[i] = i / 10.0;
  }

  rc = trexio_write_mo_data(f, 0, buffer_size, buffer_coo, buffer_value);
  if (rc != TREXIO_SUCCESS) {
    fprintf(stderr, "Error: %s\n", trexio_string_of_error(rc));
    return -1;
  }

  free(buffer_coo);
  free(buffer_data);

How ever this example throws me an error 24 Attribute does not exist in the file. Like the dimensions were not set. What am I doing wrong?

q-posev commented 7 months ago

Hi @addman2,

In the future, please forward questions about TREXIO format to the dedicated repo.

The dim readonly variable was designed for storing "manually" written quantities like CI determinants or the CSF. What it means: each time you write a batch of CI determinants, the determinants_num value in the file is updated by TREXIO behind the scenes. This is why it is readonly - the user is not supposed to modify it (hence the missing write_ functions ;-) ).

The fact that you get Attribute does not exist in the file error is normal: remember, in TREXIO the user is not supposed to write arrays before their dimensions have been written (see tutorials and documentation). The only exception is the aforementioned CI determinants and CSF.

For example, when one wants to write 2-electron integrals (which are float sparse and have mo.num as dimension) - the mo.num is checked internally by TREXIO before integrals are written. If mo.num is not present in the file - you get Attribute does not exist in the file error. Sparse format has efficient internal compression technique, which checks the value of the dimensioning variables before compressing the storage.

Hope it answers your questions.