bogind / easycsv

An R package for easy data loading from multiple tables
GNU Lesser General Public License v2.1
4 stars 4 forks source link

merge into one dataframe? #2

Closed chapmanjacobd closed 5 years ago

chapmanjacobd commented 5 years ago

After running fread_folder I'm left with a few hundred dataframes in my environment but there is no merged dataframe generated. I'm not sure if it is just the csv files I'm using. Maybe I'm just an edge case. It's the first time I've used easycsv.

...
  sep=','  with 2 lines of 3 fields using quote rule 0
  sep=0x9  with 3 lines of 13 fields using quote rule 0
  Detected 13 columns on line 1. This line is either column names or first data row. Line starts as: <<City Area    
builtup75   builtup90   >>
  Quote rule picked = 0
  fill=false and the most number of columns found is 13
[07] Detect column types, good nrow estimate and whether first row is column names
  Number of sampling jump points = 1 because (546 bytes from row 1 to eof) / (2 * 546 jump0size) == 0
  Type codes (jump 000)    : A775775555A5A  Quote rule 0
  'header' determined to be true due to column 2 containing a string on row 1 and a lower type (float64) in the rest of the 2 sample rows
  All rows were sampled since file is small so we know nrow=2 exactly
[08] Assign column names
[09] Apply user overrides on column types
  After 0 type and 0 drop user overrides : A775775555A5A
[10] Allocate memory for the datatable
  Allocating 13 column slots (13 - 0 dropped) with 2 rows
[11] Read the data
  jumps=[0..1), chunk_size=1048576, total_size=422
Read 2 rows x 13 columns from 547 bytes file in 00:00.001 wall clock time
[12] Finalizing the datatable
  Type counts:
         6 : int32     '5'
         4 : float64   '7'
         3 : string    'A'
=============================
   0.000s ( 17%) Memory map 0.000GB file
   0.001s ( 75%) sep='\t' ncol=13 and header detection
   0.000s (  3%) Column type detection using 2 sample rows
   0.000s (  2%) Allocation of 2 rows x 13 cols (0.000GB) of which 2 (100%) rows used
   0.000s (  3%) Reading 1 chunks (0 swept) of 1.000MB (each chunk 2 rows) using 1 threads
   +    0.000s (  0%) Parse to row-major thread buffers (grown 0 times)
   +    0.000s (  0%) Transpose
   +    0.000s (  3%) Waiting
   0.000s (  0%) Rereading 0 columns due to out-of-sample type exceptions
   0.001s        Total
omp_get_max_threads() = 4
omp_get_thread_limit() = 2147483647
DTthreads = 0
Input contains no \n. Taking this to be a filename to open
[01] Check arguments
  Using 4 threads (omp_get_max_threads()=4, nth=4)
  NAstrings = [<<NA>>]
  None of the NAstrings look like numbers.
  skip num lines = 0
  show progress = 1
  0/1 column will be read as integer
[02] Opening the file
  Opening file /home/xk/dataprojects/ghsl/73.csv
  File opened, size = 54.40KB (55701 bytes).
  Memory mapped ok
[03] Detect and skip BOM
[04] Arrange mmap to be \0 terminated
  \n has been found in the input and different lines can end with different line endings (e.g. mixed \n and \r\n in one file). This is common and ideal.
[05] Skipping initial rows if needed
  Positioned on line 1 starting: <<City Area    builtup75   builtup90   >>
[06] Detect separator, quoting rule, and ncolumns
  Detecting sep automatically ...
  sep=','  with 99 lines of 3 fields using quote rule 0
  sep=0x9  with 100 lines of 13 fields using quote rule 0
  Detected 13 columns on line 1. This line is either column names or first data row. Line starts as: <<City Area    builtup75   builtup90   >>
  Quote rule picked = 0
  fill=false and the most number of columns found is 13
[07] Detect column types, good nrow estimate and whether first row is column names
  Number of sampling jump points = 1 because (55700 bytes from row 1 to eof) / (2 * 21245 jump0size) == 1
  Type codes (jump 000)    : A577775555A5A  Quote rule 0
  Type codes (jump 001)    : A777775555A5A  Quote rule 0
  'header' determined to be true due to column 2 containing a string on row 1 and a lower type (float64) in the rest of the 149 sample rows
  =====
  Sampled 149 rows (handled \n inside quoted fields) at 2 jump points
  Bytes from first data row on line 2 to the end of last row: 55576
  Line length: mean=214.02 sd=8.98 min=192 max=230
  Estimated number of rows: 55576 / 214.02 = 260
  Initial alloc = 286 rows (260 + 10%) using bytes/max(mean-2*sd,min) clamped between [1.1*estn, 2.0*estn]
  =====
[08] Assign column names
[09] Apply user overrides on column types
  After 0 type and 0 drop user overrides : A777775555A5A
[10] Allocate memory for the datatable
  Allocating 13 column slots (13 - 0 dropped) with 286 rows
[11] Read the data
  jumps=[0..1), chunk_size=1048576, total_size=55576
Read 260 rows x 13 columns from 54.40KB (55701 bytes) file in 00:00.005 wall clock time
[12] Finalizing the datatable
  Type counts:
         5 : int32     '5'
         5 : float64   '7'
         3 : string    'A'
=============================
   0.000s (  5%) Memory map 0.000GB file
   0.003s ( 67%) sep='\t' ncol=13 and header detection
   0.000s (  2%) Column type detection using 149 sample rows
   0.000s (  1%) Allocation of 286 rows x 13 cols (0.000GB) of which 260 ( 91%) rows used
   0.001s ( 25%) Reading 1 chunks (0 swept) of 1.000MB (each chunk 260 rows) using 1 threads
   +    0.000s (  3%) Parse to row-major thread buffers (grown 0 times)
   +    0.001s ( 21%) Transpose
   +    0.000s (  1%) Waiting
   0.000s (  0%) Rereading 0 columns due to out-of-sample type exceptions
   0.005s        Total
omp_get_max_threads() = 4
omp_get_thread_limit() = 2147483647
DTthreads = 0
Input contains no \n. Taking this to be a filename to open
[01] Check arguments
  Using 4 threads (omp_get_max_threads()=4, nth=4)
  NAstrings = [<<NA>>]
  None of the NAstrings look like numbers.
  skip num lines = 0
  show progress = 1
  0/1 column will be read as integer
[02] Opening the file
  Opening file /home/xk/dataprojects/ghsl/74.csv
  File opened, size = 1.560KB (1597 bytes).
  Memory mapped ok
[03] Detect and skip BOM
[04] Arrange mmap to be \0 terminated
  \n has been found in the input and different lines can end with different line endings (e.g. mixed \n and \r\n in one file). This is common and ideal.
[05] Skipping initial rows if needed
  Positioned on line 1 starting: <<City Area    builtup75   builtup90   >>
[06] Detect separator, quoting rule, and ncolumns
  Detecting sep automatically ...
  sep=','  with 7 lines of 3 fields using quote rule 0
  sep=0x9  with 8 lines of 13 fields using quote rule 0
  Detected 13 columns on line 1. This line is either column names or first data row. Line starts as: <<City Area    builtup75   builtup90   >>
  Quote rule picked = 0
  fill=false and the most number of columns found is 13
[07] Detect column types, good nrow estimate and whether first row is column names
  Number of sampling jump points = 1 because (1596 bytes from row 1 to eof) / (2 * 1596 jump0size) == 0
  Type codes (jump 000)    : A777775555A5A  Quote rule 0
  'header' determined to be true due to column 2 containing a string on row 1 and a lower type (float64) in the rest of the 7 sample rows
  All rows were sampled since file is small so we know nrow=7 exactly
[08] Assign column names
[09] Apply user overrides on column types
  After 0 type and 0 drop user overrides : A777775555A5A
[10] Allocate memory for the datatable
  Allocating 13 column slots (13 - 0 dropped) with 7 rows
[11] Read the data
  jumps=[0..1), chunk_size=1048576, total_size=1472
Read 7 rows x 13 columns from 1.560KB (1597 bytes) file in 00:00.001 wall clock time
[12] Finalizing the datatable
  Type counts:
         5 : int32     '5'
         5 : float64   '7'
         3 : string    'A'
=============================
   0.000s ( 27%) Memory map 0.000GB file
   0.000s ( 53%) sep='\t' ncol=13 and header detection
   0.000s (  6%) Column type detection using 7 sample rows
   0.000s (  5%) Allocation of 7 rows x 13 cols (0.000GB) of which 7 (100%) rows used
   0.000s ( 10%) Reading 1 chunks (0 swept) of 1.000MB (each chunk 7 rows) using 1 threads
   +    0.000s (  1%) Parse to row-major thread buffers (grown 0 times)
   +    0.000s (  3%) Transpose
   +    0.000s (  6%) Waiting
   0.000s (  0%) Rereading 0 columns due to out-of-sample type exceptions
   0.001s        Total
omp_get_max_threads() = 4
omp_get_thread_limit() = 2147483647
DTthreads = 0
Input contains no \n. Taking this to be a filename to open
[01] Check arguments
  Using 4 threads (omp_get_max_threads()=4, nth=4)
  NAstrings = [<<NA>>]
  None of the NAstrings look like numbers.
  skip num lines = 0
  show progress = 1
  0/1 column will be read as integer
[02] Opening the file
  Opening file /home/xk/dataprojects/ghsl/75.csv
  File opened, size = 342 bytes.
  Memory mapped ok
[03] Detect and skip BOM
[04] Arrange mmap to be \0 terminated
  \n has been found in the input and different lines can end with different line endings (e.g. mixed \n and \r\n in one file). This is common and ideal.
[05] Skipping initial rows if needed
  Positioned on line 1 starting: <<City Area    builtup75   builtup90   >>
[06] Detect separator, quoting rule, and ncolumns
  Detecting sep automatically ...
  sep=','  with 1 lines of 3 fields using quote rule 0
  sep=0x9  with 2 lines of 13 fields using quote rule 0
  Detected 13 columns on line 1. This line is either column names or first data row. Line starts as: <<City Area    builtup75   builtup90   >>
  Quote rule picked = 0
  fill=false and the most number of columns found is 13
[07] Detect column types, good nrow estimate and whether first row is column names
  Number of sampling jump points = 1 because (341 bytes from row 1 to eof) / (2 * 341 jump0size) == 0
  Type codes (jump 000)    : A775575555A5A  Quote rule 0
  'header' determined to be true due to column 2 containing a string on row 1 and a lower type (float64) in the rest of the 1 sample rows
  All rows were sampled since file is small so we know nrow=1 exactly
[08] Assign column names
[09] Apply user overrides on column types
  After 0 type and 0 drop user overrides : A775575555A5A
[10] Allocate memory for the datatable
  Allocating 13 column slots (13 - 0 dropped) with 1 rows
[11] Read the data
  jumps=[0..1), chunk_size=1048576, total_size=217
Read 1 rows x 13 columns from 342 bytes file in 00:00.000 wall clock time
[12] Finalizing the datatable
  Type counts:
         7 : int32     '5'
         3 : float64   '7'
         3 : string    'A'
=============================
   0.000s ( 31%) Memory map 0.000GB file
   0.000s ( 54%) sep='\t' ncol=13 and header detection
   0.000s (  5%) Column type detection using 1 sample rows
   0.000s (  4%) Allocation of 1 rows x 13 cols (0.000GB) of which 1 (100%) rows used
   0.000s (  6%) Reading 1 chunks (0 swept) of 1.000MB (each chunk 1 rows) using 1 threads
   +    0.000s (  0%) Parse to row-major thread buffers (grown 0 times)
   +    0.000s (  1%) Transpose
   +    0.000s (  6%) Waiting
   0.000s (  0%) Rereading 0 columns due to out-of-sample type exceptions
   0.000s        Total
omp_get_max_threads() = 4
omp_get_thread_limit() = 2147483647
DTthreads = 0
Input contains no \n. Taking this to be a filename to open
[01] Check arguments
  Using 4 threads (omp_get_max_threads()=4, nth=4)
  NAstrings = [<<NA>>]
  None of the NAstrings look like numbers.
  skip num lines = 0
  show progress = 1
  0/1 column will be read as integer
[02] Opening the file
  Opening file /home/xk/dataprojects/ghsl/77.csv
  File opened, size = 17.74KB (18166 bytes).
  Memory mapped ok
[03] Detect and skip BOM
[04] Arrange mmap to be \0 terminated
  \n has been found in the input and different lines can end with different line endings (e.g. mixed \n and \r\n in one file). This is common and ideal.
[05] Skipping initial rows if needed
  Positioned on line 1 starting: <<City Area    builtup75   builtup90   >>
[06] Detect separator, quoting rule, and ncolumns
  Detecting sep automatically ...
  sep=','  with 84 lines of 3 fields using quote rule 0
  sep=0x9  with 85 lines of 13 fields using quote rule 0
  Detected 13 columns on line 1. This line is either column names or first data row. Line starts as: <<City Area    builtup75   builtup90   >>
  Quote rule picked = 0
  fill=false and the most number of columns found is 13
[07] Detect column types, good nrow estimate and whether first row is column names
  Number of sampling jump points = 1 because (18165 bytes from row 1 to eof) / (2 * 18165 jump0size) == 0
  Type codes (jump 000)    : A777775555A5A  Quote rule 0
  'header' determined to be true due to column 2 containing a string on row 1 and a lower type (float64) in the rest of the 84 sample rows
  All rows were sampled since file is small so we know nrow=84 exactly
[08] Assign column names
[09] Apply user overrides on column types
  After 0 type and 0 drop user overrides : A777775555A5A
[10] Allocate memory for the datatable
  Allocating 13 column slots (13 - 0 dropped) with 84 rows
[11] Read the data
  jumps=[0..1), chunk_size=1048576, total_size=18041
Read 84 rows x 13 columns from 17.74KB (18166 bytes) file in 00:00.001 wall clock time
[12] Finalizing the datatable
  Type counts:
         5 : int32     '5'
         5 : float64   '7'
         3 : string    'A'
=============================
   0.000s ( 12%) Memory map 0.000GB file
   0.001s ( 72%) sep='\t' ncol=13 and header detection
   0.000s (  2%) Column type detection using 84 sample rows
   0.000s (  2%) Allocation of 84 rows x 13 cols (0.000GB) of which 84 (100%) rows used
   0.000s ( 13%) Reading 1 chunks (0 swept) of 1.000MB (each chunk 84 rows) using 1 threads
   +    0.000s (  2%) Parse to row-major thread buffers (grown 0 times)
   +    0.000s (  9%) Transpose
   +    0.000s (  2%) Waiting
   0.000s (  0%) Rereading 0 columns due to out-of-sample type exceptions
   0.001s        Total
omp_get_max_threads() = 4
omp_get_thread_limit() = 2147483647
DTthreads = 0
Input contains no \n. Taking this to be a filename to open
[01] Check arguments
  Using 4 threads (omp_get_max_threads()=4, nth=4)
  NAstrings = [<<NA>>]
  None of the NAstrings look like numbers.
  skip num lines = 0
  show progress = 1
  0/1 column will be read as integer
[02] Opening the file
  Opening file /home/xk/dataprojects/ghsl/8.csv
  File opened, size = 1.833KB (1877 bytes).
  Memory mapped ok
[03] Detect and skip BOM
[04] Arrange mmap to be \0 terminated
  \n has been found in the input and different lines can end with different line endings (e.g. mixed \n and \r\n in one file). This is common and ideal.
[05] Skipping initial rows if needed
  Positioned on line 1 starting: <<City Area    builtup75   builtup90   >>
[06] Detect separator, quoting rule, and ncolumns
  Detecting sep automatically ...
  sep=','  with 8 lines of 3 fields using quote rule 0
  sep=0x9  with 9 lines of 13 fields using quote rule 0
  Detected 13 columns on line 1. This line is either column names or first data row. Line starts as: <<City Area    builtup75   builtup90   >>
  Quote rule picked = 0
  fill=false and the most number of columns found is 13
[07] Detect column types, good nrow estimate and whether first row is column names
  Number of sampling jump points = 1 because (1876 bytes from row 1 to eof) / (2 * 1876 jump0size) == 0
  Type codes (jump 000)    : A777775555A5A  Quote rule 0
  'header' determined to be true due to column 2 containing a string on row 1 and a lower type (float64) in the rest of the 8 sample rows
  All rows were sampled since file is small so we know nrow=8 exactly
[08] Assign column names
[09] Apply user overrides on column types
  After 0 type and 0 drop user overrides : A777775555A5A
[10] Allocate memory for the datatable
  Allocating 13 column slots (13 - 0 dropped) with 8 rows
[11] Read the data
  jumps=[0..1), chunk_size=1048576, total_size=1752
Read 8 rows x 13 columns from 1.833KB (1877 bytes) file in 00:00.002 wall clock time
[12] Finalizing the datatable
  Type counts:
         5 : int32     '5'
         5 : float64   '7'
         3 : string    'A'
=============================
   0.001s ( 39%) Memory map 0.000GB file
   0.001s ( 47%) sep='\t' ncol=13 and header detection
   0.000s (  3%) Column type detection using 8 sample rows
   0.000s (  3%) Allocation of 8 rows x 13 cols (0.000GB) of which 8 (100%) rows used
   0.000s (  8%) Reading 1 chunks (0 swept) of 1.000MB (each chunk 8 rows) using 1 threads
   +    0.000s (  1%) Parse to row-major thread buffers (grown 0 times)
   +    0.000s (  3%) Transpose
   +    0.000s (  4%) Waiting
   0.000s (  0%) Rereading 0 columns due to out-of-sample type exceptions
   0.002s        Total
omp_get_max_threads() = 4
omp_get_thread_limit() = 2147483647
DTthreads = 0
Input contains no \n. Taking this to be a filename to open
[01] Check arguments
  Using 4 threads (omp_get_max_threads()=4, nth=4)
  NAstrings = [<<NA>>]
  None of the NAstrings look like numbers.
  skip num lines = 0
  show progress = 1
  0/1 column will be read as integer
[02] Opening the file
  Opening file /home/xk/dataprojects/ghsl/80.csv
  File opened, size = 775 bytes.
  Memory mapped ok
[03] Detect and skip BOM
[04] Arrange mmap to be \0 terminated
  \n has been found in the input and different lines can end with different line endings (e.g. mixed \n and \r\n in one file). This is common and ideal.
[05] Skipping initial rows if needed
  Positioned on line 1 starting: <<City Area    builtup75   builtup90   >>
[06] Detect separator, quoting rule, and ncolumns
  Detecting sep automatically ...
  sep=','  with 3 lines of 3 fields using quote rule 0
  sep=0x9  with 4 lines of 13 fields using quote rule 0
  Detected 13 columns on line 1. This line is either column names or first data row. Line starts as: <<City Area    builtup75   builtup90   >>
  Quote rule picked = 0
  fill=false and the most number of columns found is 13
[07] Detect column types, good nrow estimate and whether first row is column names
  Number of sampling jump points = 1 because (774 bytes from row 1 to eof) / (2 * 774 jump0size) == 0
  Type codes (jump 000)    : A777755555A5A  Quote rule 0
  'header' determined to be true due to column 2 containing a string on row 1 and a lower type (float64) in the rest of the 3 sample rows
  All rows were sampled since file is small so we know nrow=3 exactly
[08] Assign column names
[09] Apply user overrides on column types
  After 0 type and 0 drop user overrides : A777755555A5A
[10] Allocate memory for the datatable
  Allocating 13 column slots (13 - 0 dropped) with 3 rows
[11] Read the data
  jumps=[0..1), chunk_size=1048576, total_size=650
Read 3 rows x 13 columns from 775 bytes file in 00:00.001 wall clock time
[12] Finalizing the datatable
  Type counts:
         6 : int32     '5'
         4 : float64   '7'
         3 : string    'A'
=============================
   0.000s ( 43%) Memory map 0.000GB file
   0.000s ( 45%) sep='\t' ncol=13 and header detection
   0.000s (  4%) Column type detection using 3 sample rows
   0.000s (  3%) Allocation of 3 rows x 13 cols (0.000GB) of which 3 (100%) rows used
   0.000s (  5%) Reading 1 chunks (0 swept) of 1.000MB (each chunk 3 rows) using 1 threads
   +    0.000s (  0%) Parse to row-major thread buffers (grown 0 times)
   +    0.000s (  1%) Transpose
   +    0.000s (  3%) Waiting
   0.000s (  0%) Rereading 0 columns due to out-of-sample type exceptions
   0.001s        Total
omp_get_max_threads() = 4
omp_get_thread_limit() = 2147483647
DTthreads = 0
Input contains no \n. Taking this to be a filename to open
[01] Check arguments
  Using 4 threads (omp_get_max_threads()=4, nth=4)
  NAstrings = [<<NA>>]
  None of the NAstrings look like numbers.
  skip num lines = 0
  show progress = 1
  0/1 column will be read as integer
[02] Opening the file
  Opening file /home/xk/dataprojects/ghsl/81.csv
  File opened, size = 27.89KB (28557 bytes).
  Memory mapped ok
[03] Detect and skip BOM
[04] Arrange mmap to be \0 terminated
  \n has been found in the input and different lines can end with different line endings (e.g. mixed \n and \r\n in one file). This is common and ideal.
[05] Skipping initial rows if needed
  Positioned on line 1 starting: <<City Area    builtup75   builtup90   >>
[06] Detect separator, quoting rule, and ncolumns
  Detecting sep automatically ...
  sep=','  with 99 lines of 3 fields using quote rule 0
  sep=0x9  with 100 lines of 13 fields using quote rule 0
  Detected 13 columns on line 1. This line is either column names or first data row. Line starts as: <<City Area    builtup75   builtup90   >>
  Quote rule picked = 0
  fill=false and the most number of columns found is 13
[07] Detect column types, good nrow estimate and whether first row is column names
  Number of sampling jump points = 1 because (28556 bytes from row 1 to eof) / (2 * 21758 jump0size) == 0
  Type codes (jump 000)    : A777775555A5A  Quote rule 0
  Type codes (jump 001)    : A777775555A5A  Quote rule 0
  'header' determined to be true due to column 2 containing a string on row 1 and a lower type (float64) in the rest of the 131 sample rows
  =====
  Sampled 131 rows (handled \n inside quoted fields) at 2 jump points
  Bytes from first data row on line 2 to the end of last row: 28432
  Line length: mean=217.04 sd=8.14 min=206 max=258
  Estimated number of rows: 28432 / 217.04 = 131
  Initial alloc = 144 rows (131 + 9%) using bytes/max(mean-2*sd,min) clamped between [1.1*estn, 2.0*estn]
  =====
[08] Assign column names
[09] Apply user overrides on column types
  After 0 type and 0 drop user overrides : A777775555A5A
[10] Allocate memory for the datatable
  Allocating 13 column slots (13 - 0 dropped) with 144 rows
[11] Read the data
  jumps=[0..1), chunk_size=1048576, total_size=28432
Read 131 rows x 13 columns from 27.89KB (28557 bytes) file in 00:00.002 wall clock time
[12] Finalizing the datatable
  Type counts:
         5 : int32     '5'
         5 : float64   '7'
         3 : string    'A'
=============================
   0.000s (  8%) Memory map 0.000GB file
   0.001s ( 57%) sep='\t' ncol=13 and header detection
   0.000s (  2%) Column type detection using 131 sample rows
   0.000s (  2%) Allocation of 144 rows x 13 cols (0.000GB) of which 131 ( 91%) rows used
   0.001s ( 32%) Reading 1 chunks (0 swept) of 1.000MB (each chunk 131 rows) using 1 threads
   +    0.000s (  3%) Parse to row-major thread buffers (grown 0 times)
   +    0.001s ( 26%) Transpose
   +    0.000s (  3%) Waiting
   0.000s (  0%) Rereading 0 columns due to out-of-sample type exceptions
   0.002s        Total
omp_get_max_threads() = 4
omp_get_thread_limit() = 2147483647
DTthreads = 0
Input contains no \n. Taking this to be a filename to open
[01] Check arguments
  Using 4 threads (omp_get_max_threads()=4, nth=4)
  NAstrings = [<<NA>>]
  None of the NAstrings look like numbers.
  skip num lines = 0
  show progress = 1
  0/1 column will be read as integer
[02] Opening the file
  Opening file /home/xk/dataprojects/ghsl/82.csv
  File opened, size = 1.156KB (1184 bytes).
  Memory mapped ok
[03] Detect and skip BOM
[04] Arrange mmap to be \0 terminated
  \n has been found in the input and different lines can end with different line endings (e.g. mixed \n and \r\n in one file). This is common and ideal.
[05] Skipping initial rows if needed
  Positioned on line 1 starting: <<City Area    builtup75   builtup90   >>
[06] Detect separator, quoting rule, and ncolumns
  Detecting sep automatically ...
  sep=','  with 5 lines of 3 fields using quote rule 0
  sep=0x9  with 6 lines of 13 fields using quote rule 0
  Detected 13 columns on line 1. This line is either column names or first data row. Line starts as: <<City Area    builtup75   builtup90   >>
  Quote rule picked = 0
  fill=false and the most number of columns found is 13
[07] Detect column types, good nrow estimate and whether first row is column names
  Number of sampling jump points = 1 because (1183 bytes from row 1 to eof) / (2 * 1183 jump0size) == 0
  Type codes (jump 000)    : A777775555A5A  Quote rule 0
  'header' determined to be true due to column 2 containing a string on row 1 and a lower type (float64) in the rest of the 5 sample rows
  All rows were sampled since file is small so we know nrow=5 exactly
[08] Assign column names
[09] Apply user overrides on column types
  After 0 type and 0 drop user overrides : A777775555A5A
[10] Allocate memory for the datatable
  Allocating 13 column slots (13 - 0 dropped) with 5 rows
[11] Read the data
  jumps=[0..1), chunk_size=1048576, total_size=1059
Read 5 rows x 13 columns from 1.156KB (1184 bytes) file in 00:00.001 wall clock time
[12] Finalizing the datatable
  Type counts:
         5 : int32     '5'
         5 : float64   '7'
         3 : string    'A'
=============================
   0.000s ( 35%) Memory map 0.000GB file
   0.000s ( 53%) sep='\t' ncol=13 and header detection
   0.000s (  4%) Column type detection using 5 sample rows
   0.000s (  3%) Allocation of 5 rows x 13 cols (0.000GB) of which 5 (100%) rows used
   0.000s (  5%) Reading 1 chunks (0 swept) of 1.000MB (each chunk 5 rows) using 1 threads
   +    0.000s (  0%) Parse to row-major thread buffers (grown 0 times)
   +    0.000s (  1%) Transpose
   +    0.000s (  4%) Waiting
   0.000s (  0%) Rereading 0 columns due to out-of-sample type exceptions
   0.001s        Total
omp_get_max_threads() = 4
omp_get_thread_limit() = 2147483647
DTthreads = 0
Input contains no \n. Taking this to be a filename to open
[01] Check arguments
  Using 4 threads (omp_get_max_threads()=4, nth=4)
  NAstrings = [<<NA>>]
  None of the NAstrings look like numbers.
  skip num lines = 0
  show progress = 1
  0/1 column will be read as integer
[02] Opening the file
  Opening file /home/xk/dataprojects/ghsl/84.csv
  File opened, size = 9.14KB (9357 bytes).
  Memory mapped ok
[03] Detect and skip BOM
[04] Arrange mmap to be \0 terminated
  \n has been found in the input and different lines can end with different line endings (e.g. mixed \n and \r\n in one file). This is common and ideal.
[05] Skipping initial rows if needed
  Positioned on line 1 starting: <<City Area    builtup75   builtup90   >>
[06] Detect separator, quoting rule, and ncolumns
  Detecting sep automatically ...
  sep=','  with 44 lines of 3 fields using quote rule 0
  sep=0x9  with 45 lines of 13 fields using quote rule 0
  Detected 13 columns on line 1. This line is either column names or first data row. Line starts as: <<City Area    builtup75   builtup90   >>
  Quote rule picked = 0
  fill=false and the most number of columns found is 13
[07] Detect column types, good nrow estimate and whether first row is column names
  Number of sampling jump points = 1 because (9356 bytes from row 1 to eof) / (2 * 9356 jump0size) == 0
  Type codes (jump 000)    : A777775555A5A  Quote rule 0
  'header' determined to be true due to column 2 containing a string on row 1 and a lower type (float64) in the rest of the 44 sample rows
  All rows were sampled since file is small so we know nrow=44 exactly
[08] Assign column names
[09] Apply user overrides on column types
  After 0 type and 0 drop user overrides : A777775555A5A
[10] Allocate memory for the datatable
  Allocating 13 column slots (13 - 0 dropped) with 44 rows
[11] Read the data
  jumps=[0..1), chunk_size=1048576, total_size=9232
Read 44 rows x 13 columns from 9.14KB (9357 bytes) file in 00:00.001 wall clock time
[12] Finalizing the datatable
  Type counts:
         5 : int32     '5'
         5 : float64   '7'
         3 : string    'A'
=============================
   0.000s ( 17%) Memory map 0.000GB file
   0.001s ( 71%) sep='\t' ncol=13 and header detection
   0.000s (  2%) Column type detection using 44 sample rows
   0.000s (  2%) Allocation of 44 rows x 13 cols (0.000GB) of which 44 (100%) rows used
   0.000s (  8%) Reading 1 chunks (0 swept) of 1.000MB (each chunk 44 rows) using 1 threads
   +    0.000s (  2%) Parse to row-major thread buffers (grown 0 times)
   +    0.000s (  4%) Transpose
   +    0.000s (  2%) Waiting
   0.000s (  0%) Rereading 0 columns due to out-of-sample type exceptions
   0.001s        Total
omp_get_max_threads() = 4
omp_get_thread_limit() = 2147483647
DTthreads = 0
Input contains no \n. Taking this to be a filename to open
[01] Check arguments
  Using 4 threads (omp_get_max_threads()=4, nth=4)
  NAstrings = [<<NA>>]
  None of the NAstrings look like numbers.
  skip num lines = 0
  show progress = 1
  0/1 column will be read as integer
[02] Opening the file
  Opening file /home/xk/dataprojects/ghsl/86.csv
  File opened, size = 3.249KB (3327 bytes).
  Memory mapped ok
[03] Detect and skip BOM
[04] Arrange mmap to be \0 terminated
  \n has been found in the input and different lines can end with different line endings (e.g. mixed \n and \r\n in one file). This is common and ideal.
[05] Skipping initial rows if needed
  Positioned on line 1 starting: <<City Area    builtup75   builtup90   >>
[06] Detect separator, quoting rule, and ncolumns
  Detecting sep automatically ...
  sep=','  with 15 lines of 3 fields using quote rule 0
  sep=0x9  with 16 lines of 13 fields using quote rule 0
  Detected 13 columns on line 1. This line is either column names or first data row. Line starts as: <<City Area    builtup75   builtup90   >>
  Quote rule picked = 0
  fill=false and the most number of columns found is 13
[07] Detect column types, good nrow estimate and whether first row is column names
  Number of sampling jump points = 1 because (3326 bytes from row 1 to eof) / (2 * 3326 jump0size) == 0
  Type codes (jump 000)    : A777775555A5A  Quote rule 0
  'header' determined to be true due to column 2 containing a string on row 1 and a lower type (float64) in the rest of the 15 sample rows
  All rows were sampled since file is small so we know nrow=15 exactly
[08] Assign column names
[09] Apply user overrides on column types
  After 0 type and 0 drop user overrides : A777775555A5A
[10] Allocate memory for the datatable
  Allocating 13 column slots (13 - 0 dropped) with 15 rows
[11] Read the data
  jumps=[0..1), chunk_size=1048576, total_size=3202
Read 15 rows x 13 columns from 3.249KB (3327 bytes) file in 00:00.001 wall clock time
[12] Finalizing the datatable
  Type counts:
         5 : int32     '5'
         5 : float64   '7'
         3 : string    'A'
=============================
   0.000s ( 26%) Memory map 0.000GB file
   0.000s ( 58%) sep='\t' ncol=13 and header detection
   0.000s (  4%) Column type detection using 15 sample rows
   0.000s (  4%) Allocation of 15 rows x 13 cols (0.000GB) of which 15 (100%) rows used
   0.000s (  9%) Reading 1 chunks (0 swept) of 1.000MB (each chunk 15 rows) using 1 threads
   +    0.000s (  1%) Parse to row-major thread buffers (grown 0 times)
   +    0.000s (  3%) Transpose
   +    0.000s (  4%) Waiting
   0.000s (  0%) Rereading 0 columns due to out-of-sample type exceptions
   0.001s        Total
omp_get_max_threads() = 4
omp_get_thread_limit() = 2147483647
DTthreads = 0
Input contains no \n. Taking this to be a filename to open
[01] Check arguments
  Using 4 threads (omp_get_max_threads()=4, nth=4)
  NAstrings = [<<NA>>]
  None of the NAstrings look like numbers.
  skip num lines = 0
  show progress = 1
  0/1 column will be read as integer
[02] Opening the file
  Opening file /home/xk/dataprojects/ghsl/88.csv
  File opened, size = 758 bytes.
  Memory mapped ok
[03] Detect and skip BOM
[04] Arrange mmap to be \0 terminated
  \n has been found in the input and different lines can end with different line endings (e.g. mixed \n and \r\n in one file). This is common and ideal.
[05] Skipping initial rows if needed
  Positioned on line 1 starting: <<City Area    builtup75   builtup90   >>
[06] Detect separator, quoting rule, and ncolumns
  Detecting sep automatically ...
  sep=','  with 3 lines of 3 fields using quote rule 0
  sep=0x9  with 4 lines of 13 fields using quote rule 0
  Detected 13 columns on line 1. This line is either column names or first data row. Line starts as: <<City Area    builtup75   builtup90   >>
  Quote rule picked = 0
  fill=false and the most number of columns found is 13
[07] Detect column types, good nrow estimate and whether first row is column names
  Number of sampling jump points = 1 because (757 bytes from row 1 to eof) / (2 * 757 jump0size) == 0
  Type codes (jump 000)    : A777775555A5A  Quote rule 0
  'header' determined to be true due to column 2 containing a string on row 1 and a lower type (float64) in the rest of the 3 sample rows
  All rows were sampled since file is small so we know nrow=3 exactly
[08] Assign column names
[09] Apply user overrides on column types
  After 0 type and 0 drop user overrides : A777775555A5A
[10] Allocate memory for the datatable
  Allocating 13 column slots (13 - 0 dropped) with 3 rows
[11] Read the data
  jumps=[0..1), chunk_size=1048576, total_size=633
Read 3 rows x 13 columns from 758 bytes file in 00:00.000 wall clock time
[12] Finalizing the datatable
  Type counts:
         5 : int32     '5'
         5 : float64   '7'
         3 : string    'A'
=============================
   0.000s ( 31%) Memory map 0.000GB file
   0.000s ( 53%) sep='\t' ncol=13 and header detection
   0.000s (  5%) Column type detection using 3 sample rows
   0.000s (  4%) Allocation of 3 rows x 13 cols (0.000GB) of which 3 (100%) rows used
   0.000s (  7%) Reading 1 chunks (0 swept) of 1.000MB (each chunk 3 rows) using 1 threads
   +    0.000s (  0%) Parse to row-major thread buffers (grown 0 times)
   +    0.000s (  1%) Transpose
   +    0.000s (  6%) Waiting
   0.000s (  0%) Rereading 0 columns due to out-of-sample type exceptions
   0.000s        Total
omp_get_max_threads() = 4
omp_get_thread_limit() = 2147483647
DTthreads = 0
Input contains no \n. Taking this to be a filename to open
[01] Check arguments
  Using 4 threads (omp_get_max_threads()=4, nth=4)
  NAstrings = [<<NA>>]
  None of the NAstrings look like numbers.
  skip num lines = 0
  show progress = 1
  0/1 column will be read as integer
[02] Opening the file
  Opening file /home/xk/dataprojects/ghsl/89.csv
  File opened, size = 793 bytes.
  Memory mapped ok
[03] Detect and skip BOM
[04] Arrange mmap to be \0 terminated
  \n has been found in the input and different lines can end with different line endings (e.g. mixed \n and \r\n in one file). This is common and ideal.
[05] Skipping initial rows if needed
  Positioned on line 1 starting: <<City Area    builtup75   builtup90   >>
[06] Detect separator, quoting rule, and ncolumns
  Detecting sep automatically ...
  sep=','  with 3 lines of 3 fields using quote rule 0
  sep=0x9  with 4 lines of 13 fields using quote rule 0
  Detected 13 columns on line 1. This line is either column names or first data row. Line starts as: <<City Area    builtup75   builtup90   >>
  Quote rule picked = 0
  fill=false and the most number of columns found is 13
[07] Detect column types, good nrow estimate and whether first row is column names
  Number of sampling jump points = 1 because (792 bytes from row 1 to eof) / (2 * 792 jump0size) == 0
  Type codes (jump 000)    : A777775555A5A  Quote rule 0
  'header' determined to be true due to column 2 containing a string on row 1 and a lower type (float64) in the rest of the 3 sample rows
  All rows were sampled since file is small so we know nrow=3 exactly
[08] Assign column names
[09] Apply user overrides on column types
  After 0 type and 0 drop user overrides : A777775555A5A
[10] Allocate memory for the datatable
  Allocating 13 column slots (13 - 0 dropped) with 3 rows
[11] Read the data
  jumps=[0..1), chunk_size=1048576, total_size=668
Read 3 rows x 13 columns from 793 bytes file in 00:00.001 wall clock time
[12] Finalizing the datatable
  Type counts:
         5 : int32     '5'
         5 : float64   '7'
         3 : string    'A'
=============================
   0.000s ( 29%) Memory map 0.000GB file
   0.000s ( 54%) sep='\t' ncol=13 and header detection
   0.000s (  5%) Column type detection using 3 sample rows
   0.000s (  3%) Allocation of 3 rows x 13 cols (0.000GB) of which 3 (100%) rows used
   0.000s (  9%) Reading 1 chunks (0 swept) of 1.000MB (each chunk 3 rows) using 1 threads
   +    0.000s (  1%) Parse to row-major thread buffers (grown 0 times)
   +    0.000s (  2%) Transpose
   +    0.000s (  6%) Waiting
   0.000s (  0%) Rereading 0 columns due to out-of-sample type exceptions
   0.001s        Total
omp_get_max_threads() = 4
omp_get_thread_limit() = 2147483647
DTthreads = 0
Input contains no \n. Taking this to be a filename to open
[01] Check arguments
  Using 4 threads (omp_get_max_threads()=4, nth=4)
  NAstrings = [<<NA>>]
  None of the NAstrings look like numbers.
  skip num lines = 0
  show progress = 1
  0/1 column will be read as integer
[02] Opening the file
  Opening file /home/xk/dataprojects/ghsl/9.csv
  File opened, size = 14.28KB (14618 bytes).
  Memory mapped ok
[03] Detect and skip BOM
[04] Arrange mmap to be \0 terminated
  \n has been found in the input and different lines can end with different line endings (e.g. mixed \n and \r\n in one file). This is common and ideal.
[05] Skipping initial rows if needed
  Positioned on line 1 starting: <<City Area    builtup75   builtup90   >>
[06] Detect separator, quoting rule, and ncolumns
  Detecting sep automatically ...
  sep=','  with 66 lines of 3 fields using quote rule 0
  sep=0x9  with 67 lines of 13 fields using quote rule 0
  Detected 13 columns on line 1. This line is either column names or first data row. Line starts as: <<City Area    builtup75   builtup90   >>
  Quote rule picked = 0
  fill=false and the most number of columns found is 13
[07] Detect column types, good nrow estimate and whether first row is column names
  Number of sampling jump points = 1 because (14617 bytes from row 1 to eof) / (2 * 14617 jump0size) == 0
  Type codes (jump 000)    : A777775555A5A  Quote rule 0
  'header' determined to be true due to column 2 containing a string on row 1 and a lower type (float64) in the rest of the 66 sample rows
  All rows were sampled since file is small so we know nrow=66 exactly
[08] Assign column names
[09] Apply user overrides on column types
  After 0 type and 0 drop user overrides : A777775555A5A
[10] Allocate memory for the datatable
  Allocating 13 column slots (13 - 0 dropped) with 66 rows
[11] Read the data
  jumps=[0..1), chunk_size=1048576, total_size=14493
Read 66 rows x 13 columns from 14.28KB (14618 bytes) file in 00:00.001 wall clock time
[12] Finalizing the datatable
  Type counts:
         5 : int32     '5'
         5 : float64   '7'
         3 : string    'A'
=============================
   0.000s ( 16%) Memory map 0.000GB file
   0.001s ( 71%) sep='\t' ncol=13 and header detection
   0.000s (  2%) Column type detection using 66 sample rows
   0.000s (  2%) Allocation of 66 rows x 13 cols (0.000GB) of which 66 (100%) rows used
   0.000s (  9%) Reading 1 chunks (0 swept) of 1.000MB (each chunk 66 rows) using 1 threads
   +    0.000s (  2%) Parse to row-major thread buffers (grown 0 times)
   +    0.000s (  5%) Transpose
   +    0.000s (  2%) Waiting
   0.000s (  0%) Rereading 0 columns due to out-of-sample type exceptions
   0.001s        Total
omp_get_max_threads() = 4
omp_get_thread_limit() = 2147483647
DTthreads = 0
Input contains no \n. Taking this to be a filename to open
[01] Check arguments
  Using 4 threads (omp_get_max_threads()=4, nth=4)
  NAstrings = [<<NA>>]
  None of the NAstrings look like numbers.
  skip num lines = 0
  show progress = 1
  0/1 column will be read as integer
[02] Opening the file
  Opening file /home/xk/dataprojects/ghsl/90.csv
  File opened, size = 536 bytes.
  Memory mapped ok
[03] Detect and skip BOM
[04] Arrange mmap to be \0 terminated
  \n has been found in the input and different lines can end with different line endings (e.g. mixed \n and \r\n in one file). This is common and ideal.
[05] Skipping initial rows if needed
  Positioned on line 1 starting: <<City Area    builtup75   builtup90   >>
[06] Detect separator, quoting rule, and ncolumns
  Detecting sep automatically ...
  sep=','  with 2 lines of 3 fields using quote rule 0
  sep=0x9  with 3 lines of 13 fields using quote rule 0
  Detected 13 columns on line 1. This line is either column names or first data row. Line starts as: <<City Area    builtup75   builtup90   >>
  Quote rule picked = 0
  fill=false and the most number of columns found is 13
[07] Detect column types, good nrow estimate and whether first row is column names
  Number of sampling jump points = 1 because (535 bytes from row 1 to eof) / (2 * 535 jump0size) == 0
  Type codes (jump 000)    : A757575555A5A  Quote rule 0
  'header' determined to be true due to column 2 containing a string on row 1 and a lower type (float64) in the rest of the 2 sample rows
  All rows were sampled since file is small so we know nrow=2 exactly
[08] Assign column names
[09] Apply user overrides on column types
  After 0 type and 0 drop user overrides : A757575555A5A
[10] Allocate memory for the datatable
  Allocating 13 column slots (13 - 0 dropped) with 2 rows
[11] Read the data
  jumps=[0..1), chunk_size=1048576, total_size=411
Read 2 rows x 13 columns from 536 bytes file in 00:00.001 wall clock time
[12] Finalizing the datatable
  Type counts:
         7 : int32     '5'
         3 : float64   '7'
         3 : string    'A'
=============================
   0.000s ( 39%) Memory map 0.000GB file
   0.000s ( 47%) sep='\t' ncol=13 and header detection
   0.000s (  4%) Column type detection using 2 sample rows
   0.000s (  4%) Allocation of 2 rows x 13 cols (0.000GB) of which 2 (100%) rows used
   0.000s (  6%) Reading 1 chunks (0 swept) of 1.000MB (each chunk 2 rows) using 1 threads
   +    0.000s (  0%) Parse to row-major thread buffers (grown 0 times)
   +    0.000s (  1%) Transpose
   +    0.000s (  4%) Waiting
   0.000s (  0%) Rereading 0 columns due to out-of-sample type exceptions
   0.001s        Total
omp_get_max_threads() = 4
omp_get_thread_limit() = 2147483647
DTthreads = 0
Input contains no \n. Taking this to be a filename to open
[01] Check arguments
  Using 4 threads (omp_get_max_threads()=4, nth=4)
  NAstrings = [<<NA>>]
  None of the NAstrings look like numbers.
  skip num lines = 0
  show progress = 1
  0/1 column will be read as integer
[02] Opening the file
  Opening file /home/xk/dataprojects/ghsl/91.csv
  File opened, size = 2.021KB (2069 bytes).
  Memory mapped ok
[03] Detect and skip BOM
[04] Arrange mmap to be \0 terminated
  \n has been found in the input and different lines can end with different line endings (e.g. mixed \n and \r\n in one file). This is common and ideal.
[05] Skipping initial rows if needed
  Positioned on line 1 starting: <<City Area    builtup75   builtup90   >>
[06] Detect separator, quoting rule, and ncolumns
  Detecting sep automatically ...
  sep=','  with 9 lines of 3 fields using quote rule 0
  sep=0x9  with 10 lines of 13 fields using quote rule 0
  Detected 13 columns on line 1. This line is either column names or first data row. Line starts as: <<City Area    builtup75   builtup90   >>
  Quote rule picked = 0
  fill=false and the most number of columns found is 13
[07] Detect column types, good nrow estimate and whether first row is column names
  Number of sampling jump points = 1 because (2068 bytes from row 1 to eof) / (2 * 2068 jump0size) == 0
  Type codes (jump 000)    : A777775555A5A  Quote rule 0
  'header' determined to be true due to column 2 containing a string on row 1 and a lower type (float64) in the rest of the 9 sample rows
  All rows were sampled since file is small so we know nrow=9 exactly
[08] Assign column names
[09] Apply user overrides on column types
  After 0 type and 0 drop user overrides : A777775555A5A
[10] Allocate memory for the datatable
  Allocating 13 column slots (13 - 0 dropped) with 9 rows
[11] Read the data
  jumps=[0..1), chunk_size=1048576, total_size=1944
Read 9 rows x 13 columns from 2.021KB (2069 bytes) file in 00:00.001 wall clock time
[12] Finalizing the datatable
  Type counts:
         5 : int32     '5'
         5 : float64   '7'
         3 : string    'A'
=============================
   0.000s ( 28%) Memory map 0.000GB file
   0.000s ( 57%) sep='\t' ncol=13 and header detection
   0.000s (  4%) Column type detection using 9 sample rows
   0.000s (  4%) Allocation of 9 rows x 13 cols (0.000GB) of which 9 (100%) rows used
   0.000s (  7%) Reading 1 chunks (0 swept) of 1.000MB (each chunk 9 rows) using 1 threads
   +    0.000s (  1%) Parse to row-major thread buffers (grown 0 times)
   +    0.000s (  2%) Transpose
   +    0.000s (  5%) Waiting
   0.000s (  0%) Rereading 0 columns due to out-of-sample type exceptions
   0.001s        Total
omp_get_max_threads() = 4
omp_get_thread_limit() = 2147483647
DTthreads = 0
Input contains no \n. Taking this to be a filename to open
[01] Check arguments
  Using 4 threads (omp_get_max_threads()=4, nth=4)
  NAstrings = [<<NA>>]
  None of the NAstrings look like numbers.
  skip num lines = 0
  show progress = 1
  0/1 column will be read as integer
[02] Opening the file
  Opening file /home/xk/dataprojects/ghsl/94.csv
  File opened, size = 5.780KB (5919 bytes).
  Memory mapped ok
[03] Detect and skip BOM
[04] Arrange mmap to be \0 terminated
  \n has been found in the input and different lines can end with different line endings (e.g. mixed \n and \r\n in one file). This is common and ideal.
[05] Skipping initial rows if needed
  Positioned on line 1 starting: <<City Area    builtup75   builtup90   >>
[06] Detect separator, quoting rule, and ncolumns
  Detecting sep automatically ...
  sep=','  with 27 lines of 3 fields using quote rule 0
  sep=0x9  with 28 lines of 13 fields using quote rule 0
  Detected 13 columns on line 1. This line is either column names or first data row. Line starts as: <<City Area    builtup75   builtup90   >>
  Quote rule picked = 0
  fill=false and the most number of columns found is 13
[07] Detect column types, good nrow estimate and whether first row is column names
  Number of sampling jump points = 1 because (5918 bytes from row 1 to eof) / (2 * 5918 jump0size) == 0
  Type codes (jump 000)    : A777775555A5A  Quote rule 0
  'header' determined to be true due to column 2 containing a string on row 1 and a lower type (float64) in the rest of the 27 sample rows
  All rows were sampled since file is small so we know nrow=27 exactly
[08] Assign column names
[09] Apply user overrides on column types
  After 0 type and 0 drop user overrides : A777775555A5A
[10] Allocate memory for the datatable
  Allocating 13 column slots (13 - 0 dropped) with 27 rows
[11] Read the data
  jumps=[0..1), chunk_size=1048576, total_size=5794
Read 27 rows x 13 columns from 5.780KB (5919 bytes) file in 00:00.001 wall clock time
[12] Finalizing the datatable
  Type counts:
         5 : int32     '5'
         5 : float64   '7'
         3 : string    'A'
=============================
   0.000s ( 16%) Memory map 0.000GB file
   0.001s ( 70%) sep='\t' ncol=13 and header detection
   0.000s (  3%) Column type detection using 27 sample rows
   0.000s (  4%) Allocation of 27 rows x 13 cols (0.000GB) of which 27 (100%) rows used
   0.000s (  7%) Reading 1 chunks (0 swept) of 1.000MB (each chunk 27 rows) using 1 threads
   +    0.000s (  1%) Parse to row-major thread buffers (grown 0 times)
   +    0.000s (  4%) Transpose
   +    0.000s (  3%) Waiting
   0.000s (  0%) Rereading 0 columns due to out-of-sample type exceptions
   0.001s        Total
omp_get_max_threads() = 4
omp_get_thread_limit() = 2147483647
DTthreads = 0
Input contains no \n. Taking this to be a filename to open
[01] Check arguments
  Using 4 threads (omp_get_max_threads()=4, nth=4)
  NAstrings = [<<NA>>]
  None of the NAstrings look like numbers.
  skip num lines = 0
  show progress = 1
  0/1 column will be read as integer
[02] Opening the file
  Opening file /home/xk/dataprojects/ghsl/97.csv
  File opened, size = 342 bytes.
  Memory mapped ok
[03] Detect and skip BOM
[04] Arrange mmap to be \0 terminated
  \n has been found in the input and different lines can end with different line endings (e.g. mixed \n and \r\n in one file). This is common and ideal.
[05] Skipping initial rows if needed
  Positioned on line 1 starting: <<City Area    builtup75   builtup90   >>
[06] Detect separator, quoting rule, and ncolumns
  Detecting sep automatically ...
  sep=','  with 1 lines of 3 fields using quote rule 0
  sep=0x9  with 2 lines of 13 fields using quote rule 0
  Detected 13 columns on line 1. This line is either column names or first data row. Line starts as: <<City Area    builtup75   builtup90   >>
  Quote rule picked = 0
  fill=false and the most number of columns found is 13
[07] Detect column types, good nrow estimate and whether first row is column names
  Number of sampling jump points = 1 because (341 bytes from row 1 to eof) / (2 * 341 jump0size) == 0
  Type codes (jump 000)    : A777555555A5A  Quote rule 0
  'header' determined to be true due to column 2 containing a string on row 1 and a lower type (float64) in the rest of the 1 sample rows
  All rows were sampled since file is small so we know nrow=1 exactly
[08] Assign column names
[09] Apply user overrides on column types
  After 0 type and 0 drop user overrides : A777555555A5A
[10] Allocate memory for the datatable
  Allocating 13 column slots (13 - 0 dropped) with 1 rows
[11] Read the data
  jumps=[0..1), chunk_size=1048576, total_size=217
Read 1 rows x 13 columns from 342 bytes file in 00:00.000 wall clock time
[12] Finalizing the datatable
  Type counts:
         7 : int32     '5'
         3 : float64   '7'
         3 : string    'A'
=============================
   0.000s ( 30%) Memory map 0.000GB file
   0.000s ( 53%) sep='\t' ncol=13 and header detection
   0.000s (  5%) Column type detection using 1 sample rows
   0.000s (  4%) Allocation of 1 rows x 13 cols (0.000GB) of which 1 (100%) rows used
   0.000s (  8%) Reading 1 chunks (0 swept) of 1.000MB (each chunk 1 rows) using 1 threads
   +    0.000s (  0%) Parse to row-major thread buffers (grown 0 times)
   +    0.000s (  1%) Transpose
   +    0.000s (  7%) Waiting
   0.000s (  0%) Rereading 0 columns due to out-of-sample type exceptions
   0.000s        Total

I'm not sure what I'm doing wrong. There's no 'error' message

fread_folder(directory = "~/dataprojects/ghsl",extension = "CSV", check.names=T,verbose = T)

bogind commented 5 years ago

Hi, You're not doing anything wrong. I just havent had the time to add the functionality that lets you create one data frame from a folder. That's the next thing on the plan but it's not currently supported

alexfun commented 5 years ago

Hi, instead of assigning to the global environment, why don't you assign to a list and apply rbindlist on it?

fread_folder <- 
    function (directory = NULL, extension = "CSV", sep = "auto", 
              nrows = -1L, header = "auto", na.strings = "NA", stringsAsFactors = FALSE, 
              verbose = getOption("datatable.verbose"), skip = 0L, drop = NULL, 
              colClasses = NULL, integer64 = getOption("datatable.integer64"), 
              dec = if (sep != ".") "." else ",", check.names = FALSE, 
              encoding = "unknown", quote = "\"", strip.white = TRUE, 
              fill = FALSE, blank.lines.skip = FALSE, key = NULL, Names = NULL, 
              prefix = NULL, showProgress = interactive(), data.table = TRUE) 
    {
        if ("data.table" %in% rownames(installed.packages()) == 
            FALSE) {
            stop("data.table needed for this function to work. Please install it.", 
                 call. = FALSE)
        }
        if (is.null(directory)) {
            os = Identify.OS()
            if (tolower(os) == "windows") {
                directory <- utils::choose.dir()
                if (tolower(os) == "linux" | tolower(os) == "macosx") {
                    directory <- choose_dir()
                }
            }
            else {
                stop("Please supply a valid local directory")
            }
        }
        directory = paste(gsub(pattern = "\\", "/", directory, fixed = TRUE))
        endings = list()
        if (tolower(extension) == "txt") {
            endings[1] = "*\\.txt$"
        }
        if (tolower(extension) == "csv") {
            endings[1] = "*\\.csv$"
        }
        if (tolower(extension) == "both") {
            endings[1] = "*\\.txt$"
            endings[2] = "*\\.csv$"
        }
        if ((tolower(extension) %in% c("txt", "csv", "both")) == 
            FALSE) {
            stop("Pleas supply a valid value for 'extension',\n\n         allowed values are: 'TXT','CSV','BOTH'.")
        }
        tempfiles = list()
        temppath = list()
        tempdf_list = list()
        num = 1
        for (i in endings) {
            temppath = paste(directory, list.files(path = directory, 
                                                   pattern = i), sep = "/")
            tempfiles = list.files(path = directory, pattern = i)
            num = num + 1
            if (length(temppath) < 1 | length(tempfiles) < 1) {
                num = num + 1
            } else {
                temppath = unlist(temppath)
                tempfiles = unlist(tempfiles)
                count = 0
                for (tbl in temppath) {
                    count = count + 1
                    DTname1 = paste0(gsub(directory, "", tbl))
                    DTname2 = paste0(gsub("/", "", DTname1))
                    if (!is.null(Names)) {
                        if ((length(Names) != length(temppath)) | 
                            (class(Names) != "character")) {
                            stop("Names must a character vector of same length as the files to be read.")
                        } else {
                            DTname3 = Names[count]
                        }
                    } else {
                        DTname3 = paste0(gsub(i, "", DTname2))
                    }

                    if (!is.null(prefix) && is.character(prefix)) {
                        DTname4 = paste(prefix, DTname3, sep = "")
                    } else {
                        DTname4 = DTname3
                    }

                    DTable <- data.table::fread(input = tbl, sep = sep, 
                                                nrows = nrows, header = header, na.strings = na.strings, 
                                                stringsAsFactors = stringsAsFactors, verbose = verbose, 
                                                skip = skip, drop = drop, colClasses = colClasses, 
                                                dec = if (sep != ".") "." else ",", 
                                                check.names = check.names, encoding = encoding, 
                                                quote = quote, strip.white = strip.white, 
                                                fill = fill, blank.lines.skip = blank.lines.skip, 
                                                key = key, showProgress = showProgress, data.table = data.table)

                    # assign_to_global <- function(pos = 1) {
                    #     assign(x = DTname4, value = DTable, envir = as.environment(pos))
                    # }
                    # assign_to_global()

                    tempdf_list <- append(tempdf_list, list(DTable))

                    rm(DTable)
                }
            }
        }

        tempdf = data.table::rbindlist(tempdf_list)

        if(!data.table) {
            tempdf = as.data.frame(tempdf)
        }

        return(tempdf)

    }
bogind commented 5 years ago

@alexfun looks good. Do you want to create a pull request?

alexfun commented 5 years ago

@bogind I would be more than happy to submit the function above, however I am not sure whether you had something in mind with the code that assigns a variable name based on the file name in the folder. If you would like, I can add a new parameter combine taking one of the following values: c("data.frame", "global", "list") so that

  1. global preserves existing behaviour.
  2. list returns a named list of the csvs, using the currently used naming convention.
  3. data.frame returns one data frame via rbindlist.
bogind commented 5 years ago

@alexfun The combine parameter seems logical, I think the regular behavior should be using global as the value

alexfun commented 5 years ago

ok, i will write the code with global as the default behaviour and submit it to you for review.