fiedsch / datamanagement

Data management helpers (PHP-CLI)
MIT License
2 stars 0 forks source link
csv-data data datamanagement helper php

Datamanagement Tools

PHP classes and helpers for managing data read from text files

Examples

Work on CSV data

<?php

require __DIR__ . '/../vendor/autoload.php';

use Fiedsch\Data\File\CsvReader;

try {

  $reader = new CsvReader("testdata.csv", ";");

  // Read and handle all lines containing data.

  while (($line = $reader->getLine()) !== null) {
    // ignore empty lines (i.e. lines containing no data)
    if (!$reader->isEmpty($line)) {
      print_r($line);
    }
  }
  // $reader->close(); // not needed as it will be automatically called when there are no more lines

} catch (Exception $e) {
    print $e->getMessage() . "\n";
}

Features

As of v0.3.2 the typical boilerplate "open file, read every non-empty line, close file" can be written in a fancier way. Use the optional parameter to getLine():

 <?php

   while (($line = $reader->getLine(Reader::SKIP_EMPTY_LINES)) !== null) {
       print_r($line);
   }

Data augmentation

<?php

require __DIR__ . '/../vendor/autoload.php';

use Fiedsch\Data\File\CsvReader;
use Fiedsch\Data\Augmentation\Augmentor;
use Fiedsch\Data\Augmentation\Provider\TokenServiceProvider;
use Fiedsch\Data\File\CsvWriter;

try {

  $augmentor = new Augmentor();

  $augmentor->register(new TokenServiceProvider());

  $augmentor->addRule('token', function (Augmentor $augmentor, $data) {
     return [ 'token' => $augmentor['token']->getUniqueToken() ];
   });

   $reader = new CsvReader("testdata.csv", ";");

   $writer = new CsvWriter("testdata.augmented.txt", "\t");

   $header_written = false;

   while (($line = $reader->getLine(Reader::SKIP_EMPTY_LINES)) !== null) {
     $result = $augmentor->augment($line);
     if (!$header_written) {
        $writer->printLine(array_merge(['input_line'], array_keys($result), $reader->getHeader()));
        $header_written = true;
     }
     $writer->printLine(array_merge([$reader->getLineNumber()], $result, $line));
   }

   $writer->close();

 } catch (Exception $e) {
     print $e->getMessage() . "\n";
 }

Creating Tokens

Method one: let the TokenCreator make sure, we have unique tokens:

 <?php

 require __DIR__ . '/../vendor/autoload.php';

 use Fiedsch\Data\Utility\TokenCreator;
 use Fiedsch\Data\File\Writer;

$creator = new TokenCreator(10, TokenCreator::UPPER);

$output = new Writer('mytokens.txt');
$numTokens = 1000;

while ($numTokens-- > 0) {
  $token = $creator->getUniqueToken();
  $output->printLine([$token]);
}
$output->close();

Method two: generate tokens first and then check if they are unique. This might be faster and less resource consuming for large amounts of tokens:

  // same as above, exept 
  // $token = $creator->getUniqueToken();
  // becomes
  $token = $creator->cretateToken();

Check that the generated tokens are unique

echo " both lines show the same numbers, there were no duplicate tokens"
wc -l mytokens.csv
sort mytokens.csv | uniq | wc -l