geektutu / blog

极客兔兔的博客,Coding Coding 创建有趣的开源项目。
https://geektutu.com
Apache License 2.0
166 stars 21 forks source link

for 和 range 的性能比较 | Go 语言高性能编程 | 极客兔兔 #109

Open geektutu opened 3 years ago

geektutu commented 3 years ago

https://geektutu.com/post/hpg-range.html

Go 语言/golang 高性能编程,Go 语言进阶教程,Go 语言高性能编程(high performance go)。本文比较了普通的 for 循环和 range 在不同场景下的性能,并解释了背后的原理:range 迭代时返回迭代值的拷贝,如果每个迭代值占用内存过大,性能将显著地低于 for,将元素类型改为指针,能够解决这一问题。

rxda commented 3 years ago

建议加个for k := range ...的benchmark, 理论上和 for i:=0 ... 性能是一样的

weichangdong commented 3 years ago
len := len(items)

这种写法 是不是不太好

geektutu commented 3 years ago

@RXDA 感谢建议,增加了 range 只遍历下标的场景,和 for 没有区别。这部分主要是想说明 range 直接迭代值时,值的拷贝会耗时。

geektutu commented 3 years ago
len := len(items)

这种写法 是不是不太好

@weichangdong 感谢建议,已修改为 length

FVCharm commented 3 years ago

generateItems(1024)是哪的函数

geektutu commented 3 years ago

generateItems(1024)是哪的函数

@FVCharm fixed,这里是 commit,网页一天后会自动刷新,之前没贴全。代码 Git 仓里有归档 code

attitudefx7 commented 3 years ago

兔兔你好,我是在使用 Go二刷 leetcode的时候发现使用for 和 range 时提交的beats不一致,想研究下两者的性能差异的。然后就找到了你这里。我能转载吗。会附上原文的链接

YunxiangHuang commented 2 years ago

Hello~ 跑 Benchmark 的时候,貌似漏 ResetTimer 了,这样测出来会包括 generateWithCap 的时间的。

YouXam commented 2 years ago

@geektutu

len := len(items)

这种写法 是不是不太好

@weichangdong 感谢建议,已修改为 length

2.1 []int 中没有修改过来

TCP404 commented 2 years ago

现在是被优化了吗?为啥我跑出来没啥差别,即使结构体切片和结构体指针切片,也没啥区别。

const N = 1 << 5

type Per struct {
    ins [2048]byte
    age int
}

func geneS(n int) []Per {
    persons := make([]Per, 0, n)
    rand.Seed(time.Now().Unix())
    for i := 0; i < n; i++ {
        persons = append(persons, Per{age: rand.Int(), ins: [2048]byte{'a'}})
    }
    return persons
}

func genePS(n int) []*Per {
    persons := make([]*Per, 0, n)
    rand.Seed(time.Now().Unix())
    for z := 0; z < n; z++ {
        persons = append(persons, &Per{age: rand.Int(), ins: [2048]byte{'a'}})
    }
    return persons
}

func S_For(persons []Per) {
    n := len(persons)
    for i := 0; i < n; i++ {
        _ = persons[i].age
    }
}

func S_RangeI(persons []Per) {
    for i := range persons {
        _ = persons[i].age
    }
}

func S_RangeIV(persons []Per) {
    for i, v := range persons {
        _, _ = i, v.age
    }
}

func BenchmarkS_For(b *testing.B) {
    persons := geneS(N)
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        S_For(persons)
    }
}

func BenchmarkS_RangeI(b *testing.B) {
    persons := geneS(N)
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        S_RangeI(persons)
    }
}
func BenchmarkS_RangeIV(b *testing.B) {
    persons := geneS(N)
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        S_RangeIV(persons)
    }
}

func PS_For(persons []*Per) {
    n := len(persons)
    for i := 0; i < n; i++ {
        _ = persons[i].age
    }
}

func PS_RangeI(persons []*Per) {
    for i := range persons {
        _ = persons[i].age
    }
}

func PS_RangeIV(persons []*Per) {
    for i, v := range persons {
        _, _ = i, v.age
    }
}

func BenchmarkPS_For(b *testing.B) {
    persons := genePS(N)
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        PS_For(persons)
    }
}

func BenchmarkPS_RangeI(b *testing.B) {
    persons := genePS(N)
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        PS_RangeI(persons)
    }
}
func BenchmarkPS_RangeIV(b *testing.B) {
    persons := genePS(N)
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        PS_RangeIV(persons)
    }
}
go version go1.16.6 linux/amd64

$ go test . -bench=S_ -benchmem   
goos: linux
goarch: amd64
cpu: Intel(R) Core(TM) i5-7300HQ CPU @ 2.50GHz
BenchmarkS_For-4                74827876                14.12 ns/op            0 B/op          0 allocs/op
BenchmarkS_RangeI-4             66874195                17.93 ns/op            0 B/op          0 allocs/op
BenchmarkS_RangeIV-4            68596330                16.62 ns/op            0 B/op          0 allocs/op
BenchmarkPS_For-4               72429004                16.66 ns/op            0 B/op          0 allocs/op
BenchmarkPS_RangeI-4            63810568                18.65 ns/op            0 B/op          0 allocs/op
BenchmarkPS_RangeIV-4           64354776                18.65 ns/op            0 B/op          0 allocs/op
PASS
ok      8.086s

const N = 1 << 20 时跑出来这种结果

go version go1.16.6 linux/amd64

$ go test . -bench=S_ -benchmem        
goos: linux
goarch: amd64
cpu: Intel(R) Core(TM) i5-7300HQ CPU @ 2.50GHz
BenchmarkS_For-4                    2842            426327 ns/op               0 B/op          0 allocs/op
BenchmarkS_RangeI-4                 2817            421695 ns/op               0 B/op          0 allocs/op
BenchmarkS_RangeIV-4                2839            431870 ns/op               0 B/op          0 allocs/op
BenchmarkPS_For-4                    121           9914194 ns/op               0 B/op          0 allocs/op
BenchmarkPS_RangeI-4                 100          10077099 ns/op               0 B/op          0 allocs/op
BenchmarkPS_RangeIV-4                100          13032966 ns/op               0 B/op          0 allocs/op
PASS
ok      16.817s
GalingLau commented 2 years ago

for和range的性能比较的时候,int类型你用的时候是切片,但是struct类型的时候你用的是数组,这样比较是不是不太好?我自己测试的时候,用切片存储int类型或struct类型,for和range测试的结果性能都是差不多的,但是用数组去存储int类型和struct类型的时候,for和range测试的性能是有大差别的

package main

import "testing"

type Item struct {
    Id  int
    Val [4096]byte
}

func BenchmarkForStructArray(b *testing.B) {
    var items [1024]Item
    b.ResetTimer()
    for i := 0; i < 1024; i++ {
        items[i].Id = i
    }
    for i := 0; i < b.N; i++ {
        length := len(items)
        for k := 0; k < length; k++ {
            _ = items[k]
        }
    }
}

func BenchmarkRangeStructArray(b *testing.B) {
    var items [1024]Item
    for i := 0; i < 1024; i++ {
        items[i].Id = i
    }
    b.ResetTimer()

    for i := 0; i < b.N; i++ {
        for j,v := range items {
            _,_ = j,v
        }
    }
}
func BenchmarkForStructSlice(b *testing.B) {
    items:=make([]Item,1024)
    for i:=0;i<1024;i++{
        items = append(items,Item{Id: i})
    }
    b.ResetTimer()
    for a:=0; a<b.N;a++  {
        length := len(items)
        for i := 0; i < length; i++ {
            _, _ = i, items[i]
        }
    }
}

func BenchmarkRangeForStructSlice(b *testing.B) {
    items:=make([]Item,1024)
    for i:=0;i<1024;i++{
        items = append(items,Item{Id: i})
    }
    b.ResetTimer()
    for a := 0; a < b.N; a++ {
        for i,v:= range items {
            _,_ = i,v
        }
    }
}
goarch: arm64
BenchmarkForStructArray
BenchmarkForStructArray-8                3350511               330.5 ns/op
BenchmarkRangeStructArray
BenchmarkRangeStructArray-8                 6808            166979 ns/op
BenchmarkForStructSlice
BenchmarkForStructSlice-8                1790014               654.0 ns/op
BenchmarkRangeForStructSlice
BenchmarkRangeForStructSlice-8           1823648               653.2 ns/op
PASS
ok      command-line-arguments  6.726s
dablelv commented 2 years ago

@GalingLau for和range的性能比较的时候,int类型你用的时候是切片,但是struct类型的时候你用的是数组,这样比较是不是不太好?我自己测试的时候,用切片存储int类型或struct类型,for和range测试的结果性能都是差不多的,但是用数组去存储int类型和struct类型的时候,for和range测试的性能是有大差别的

package main

import "testing"

type Item struct {
  Id  int
  Val [4096]byte
}

func BenchmarkForStructArray(b *testing.B) {
  var items [1024]Item
  b.ResetTimer()
  for i := 0; i < 1024; i++ {
      items[i].Id = i
  }
  for i := 0; i < b.N; i++ {
      length := len(items)
      for k := 0; k < length; k++ {
          _ = items[k]
      }
  }
}

func BenchmarkRangeStructArray(b *testing.B) {
  var items [1024]Item
  for i := 0; i < 1024; i++ {
      items[i].Id = i
  }
  b.ResetTimer()

  for i := 0; i < b.N; i++ {
      for j,v := range items {
          _,_ = j,v
      }
  }
}
func BenchmarkForStructSlice(b *testing.B) {
  items:=make([]Item,1024)
  for i:=0;i<1024;i++{
      items = append(items,Item{Id: i})
  }
  b.ResetTimer()
  for a:=0; a<b.N;a++  {
      length := len(items)
      for i := 0; i < length; i++ {
          _, _ = i, items[i]
      }
  }
}

func BenchmarkRangeForStructSlice(b *testing.B) {
  items:=make([]Item,1024)
  for i:=0;i<1024;i++{
      items = append(items,Item{Id: i})
  }
  b.ResetTimer()
  for a := 0; a < b.N; a++ {
      for i,v:= range items {
          _,_ = i,v
      }
  }
}
goarch: arm64
BenchmarkForStructArray
BenchmarkForStructArray-8                3350511               330.5 ns/op
BenchmarkRangeStructArray
BenchmarkRangeStructArray-8                 6808            166979 ns/op
BenchmarkForStructSlice
BenchmarkForStructSlice-8                1790014               654.0 ns/op
BenchmarkRangeForStructSlice
BenchmarkRangeForStructSlice-8           1823648               653.2 ns/op
PASS
ok      command-line-arguments  6.726s

我这边试了一下,确实如此。不知道博主为何不更新了

yrka5180 commented 2 years ago

@dablelv

@GalingLau for和range的性能比较的时候,int类型你用的时候是切片,但是struct类型的时候你用的是数组,这样比较是不是不太好?我自己测试的时候,用切片存储int类型或struct类型,for和range测试的结果性能都是差不多的,但是用数组去存储int类型和struct类型的时候,for和range测试的性能是有大差别的

package main

import "testing"

type Item struct {
    Id  int
    Val [4096]byte
}

func BenchmarkForStructArray(b *testing.B) {
    var items [1024]Item
    b.ResetTimer()
    for i := 0; i < 1024; i++ {
        items[i].Id = i
    }
    for i := 0; i < b.N; i++ {
        length := len(items)
        for k := 0; k < length; k++ {
            _ = items[k]
        }
    }
}

func BenchmarkRangeStructArray(b *testing.B) {
    var items [1024]Item
    for i := 0; i < 1024; i++ {
        items[i].Id = i
    }
    b.ResetTimer()

    for i := 0; i < b.N; i++ {
        for j,v := range items {
            _,_ = j,v
        }
    }
}
func BenchmarkForStructSlice(b *testing.B) {
    items:=make([]Item,1024)
    for i:=0;i<1024;i++{
        items = append(items,Item{Id: i})
    }
    b.ResetTimer()
    for a:=0; a<b.N;a++  {
        length := len(items)
        for i := 0; i < length; i++ {
            _, _ = i, items[i]
        }
    }
}

func BenchmarkRangeForStructSlice(b *testing.B) {
    items:=make([]Item,1024)
    for i:=0;i<1024;i++{
        items = append(items,Item{Id: i})
    }
    b.ResetTimer()
    for a := 0; a < b.N; a++ {
        for i,v:= range items {
            _,_ = i,v
        }
    }
}
goarch: arm64
BenchmarkForStructArray
BenchmarkForStructArray-8                3350511               330.5 ns/op
BenchmarkRangeStructArray
BenchmarkRangeStructArray-8                 6808            166979 ns/op
BenchmarkForStructSlice
BenchmarkForStructSlice-8                1790014               654.0 ns/op
BenchmarkRangeForStructSlice
BenchmarkRangeForStructSlice-8           1823648               653.2 ns/op
PASS
ok      command-line-arguments  6.726s

我这边试了一下,确实如此。不知道博主为何不更新了

因为切片底层是指针数组

YanCunJu commented 1 year ago

和切片不同的是,迭代过程中,删除还未迭代到的键值对,则该键值对不会被迭代。

其实迭代切片的时候,也可以删除和追加元素。严格来讲,应该是迭代切片的时候,长度是开始计算好的,而 map 是支持动态计算的。

func main() {
    words := []string{"Go", "语言", "高性能", "编程"}
    for i, s := range words {
        words = append(words[:1], words[2:]...)
        words = append(words, "test")
        fmt.Println(i, s)
    }
}