dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.21k stars 1.57k forks source link

List.split #29837

Closed Hixie closed 3 years ago

Hixie commented 7 years ago

It would be great if there was an analogue to String.split on List. It would take a predicate and return an Iterable of Iterables or a List of Lists or some such.

For example here I have a bunch of lines and I want to separate them at each empty line so that each block of text is in its own list.

lrhn commented 7 years ago

Sounds like either grouping or splitting, depending on the perspective (perhaps whether the matches are included or not).

String split is "easy" because we have the Pattern class to specify a match that can span multiple characters, and we have a Match class to represent the match. Also, the match doesn't include the parts between matches, but strings are random-access so that's not a a problem.

If we try to generalize that. it should work on lists, not iterables, and the match would represent a sequence of list elements. (And that's where it becomes annoying that strings aren't just Uint16 lists, then it would really generalize :)

All in all, it sounds fairly specialized (or if generalized completely, perhaps too general too be easy to use). I'm not sure it's worth putting directly on the List class, but it should be simple to make a utility function that just splits in single elements:

Iterable<List<T>> split<T>(List<T> elements, bool test(T element), 
                       { bool emptyParts = false }) sync* {
  if (elements.isEmpty) return;
  int start = 0;
  for (int i = 0; i < elements.length; i++) {
    var element = elements[i];
    if (test(element)) {
      if (start < i || emptyParts) {
        yield elements.sublist(start, i);
      }
      start = i + 1;
    }
  }
  if (start < elements.length || emptyParts) {
    yield elements.sublist(start);
  }
}
lrhn commented 3 years ago

We now have splitAfter, splitBefore and splitBetween extension methods in package:collection.

That should be sufficient for most predicate-based attempts to split a list.

We don't currently plan to support anything requiring a "list pattern".

insinfo commented 1 year ago

I'm currently working on a project that consists of porting a python library to dart, and I came across this urge to split a list based on a specific element and I saw that python has a native List.split.

I made this implementation below that solved my problem but it would be interesting if the class list was richer like in python, c# and java.


static List<List<T>> splitList<T>(List<T> list, dynamic splitBy) {
    var results = <List<T>>[];
    var result = <T>[];
    //var count = 0;
    for (var item in list) {
      if (item == splitBy) {
        results.add(result);
        result = [];
      } else {
        result.add(item);
      }
    }

    return results;
  }
  //Example: 
  const int NULL_BYTE = 0;
  var bytes = [12, 24, 0, 50, 47, 0];
  print(Utils.splitList(bytes, NULL_BYTE));
  //Result [[12, 24], [50, 47]]
eernstg commented 1 year ago

At this point you'd probably use an extension method (added to Dart in version 2.7, Dec 2019):

extension IterableSplit<T> on Iterable<T> {
  List<List<T>> split(Object? splitBy) {
    var results = <List<T>>[];
    var result = <T>[];
    for (var item in this) {
      if (splitBy == item) {
        results.add(result);
        result = [];
      } else {
        result.add(item);
      }
    }
    return results;
  }
}

void main() {
  const int nullByte = 0;
  var bytes = [12, 24, 0, 50, 47, 0];
  print(bytes.split(nullByte)); // '[[12, 24], [50, 47]]'.
}
lrhn commented 1 year ago

The problem here can be solved by spliteBefore or splitAfter, if you don't mind having the empty line in one of the parts.

Splitting a list and removing some of the elements (just one here, but why not generalize?) brings us back to the "sublist pattern" problem, where you need to recognize which slices to include and which to exclude. (For example, what would you do if text blocks are split by two empty lines?)

That sounds more like a design like:

extension Split<T> on Iterable<T> {
   /// Splits elements into lists.
   ///
   /// Starts a list at the start, and runs until [startSplit] returns true for an element.
   /// That becomes the first list emitted.
   /// Then skips elements until [endSplit] returns true, *starting with the same element again*.
   /// The first element where [endSplit] returns true is the first element of the next list,
   /// which the continues adding elements until [startSplit] returns true for a new element.
   Iterable<List<T>> splitAround(bool Function(T) startSplit, bool Function(T) endSplit) sync* {
     List<T>? list = [];
     bool first = true;
     for (var element in this) {
       if (list != null) {
         if (startSplit(element)) {
           if (list.isNotEmpty) yield list; 
           list = null;
         } else {
           list.add(element);
           continue;
         }
       }
       if (endSplit(element)) {
         list = [element]; 
       }
     }
     if (list != null && list.isNotEmpty) yield list;
   }
}

This is a completely general "snip elements into non-overlapping sublist" function.

Even with that generality, I'm not sure it's actually needed often enough that it belongs in package:collection.

insinfo commented 1 year ago
//my implementation
 List<List<T>> splitList<T>(List<T> inputList, dynamic splitBy) {
    var list = inputList.last == splitBy ? inputList : [...inputList, splitBy];
    var results = <List<T>>[];
    var result = <T>[];
    for (var item in list) {
      if (item == splitBy) {
        if (result.isNotEmpty) {
          results.add(result);
        }
        result = [];
      } else {
        result.add(item);
      }
    }
    return results;
  }

var data = [0, 0, 0, 10, 83, 67, 82, 65, 77, 45, 83, 72, 65, 45, 50, 53, 54, 45, 80, 76, 85, 83, 0, 83, 67, 82, 65, 77, 45, 83, 72, 65, 45, 50, 53, 54, 0, 0];

var dataParts =  splitAround(data, (v) => v == NULL_BYTE, (v) => v != NULL_BYTE).toList();
print(dataParts);

//Result: [[10, 83, 67, 82, 65, 77, 45, 83, 72, 65, 45, 50, 53, 54, 45, 80, 76, 85, 83], [83, 67, 82, 65, 77, 45, 83, 72, 65, 45, 50, 53, 54]]

var dataParts = splitList(data, NULL_BYTE);
print(dataParts);
//Result: [[10, 83, 67, 82, 65, 77, 45, 83, 72, 65, 45, 50, 53, 54, 45, 80, 76, 85, 83], [83, 67, 82, 65, 77, 45, 83, 72, 65, 45, 50, 53, 54]]
 var stopwatch = Stopwatch()..start();

  var data = [0, 0, 0, 10, 83, 67, 82, 65, 77, 45, 83, 72, 65, 45, 50, 53, 54, 45, 80, 76, 85, 83, 0, 83, 67, 82, 65, 77, 45, 83, 72, 65, 45, 50, 53, 54, 0, 0];

  var count = 0;
  List dataParts;
  while (true) {
    //executed in 0:00:09.339266
    // dataParts =
    //     Utils.splitAround(data, (v) => v == NULL_BYTE, (v) => v != NULL_BYTE)
    //         .toList();

    //executed in 0:00:04.929821
    dataParts = Utils.splitList(data, NULL_BYTE);

    if (count == 10000000) {
      break;
    }
    count++;
  }
  print(dataParts[0]);
  print(dataParts[1]);
  print('executed in ${stopwatch.elapsed}');

I did some tests and I saw that my implementation is much faster